Comparing rules and machine learning natural language processing approaches

Introduction

In my past posts, I showed both how to create a simple NLP model with rules and machine learning, and how to improve those same rules and machine learning NLP models. In this post I will compare the two approaches and give advice for when each approach is appropriate.

Overall comparison

The following table lays out some key points to consider when picking a natural language processing approach.

	Rules-based	Machine-learning-based
What people and skills are needed to be successful?	Expert rules developers to write rules. Subject matter experts to advise the rules writers.	Subject matter experts will annotate all the documents. A chief annotator will have to advise/manage the subject matter experts
How much effort is required for a prototype?	Little - a skilled rule writer can very quickly generate some results with one or two rules.	There is an up-front cost to train the SMEs in the use of the training tool. Afterwards, initial results could come within hours.
How is accuracy improved?	By writing more/better rules.	By changing the training data, usually by adding more.
If the model makes a mistake, how can it be fixed?	Write a new rule. This increases the complexity of the model. It's possible the rule will not generalize and will only fix one case.	Add the sentence/document with the mistake to the training data. There is no guarantee that the model will get this sentence/document right the next time, or the model may make a mistake somewhere else. Generally, the model improves overall with more training data.
When does accuracy improvement reach diminishing returns?	When the number and complexity of rules becomes difficult to maintain.	When adding new training documents fails to improve accuracy.
How much subject matter expert (SME) time is required?	Some, the SMEs should periodically advise the rules developers.	A lot, ideally the SMEs are doing all of the annotation tasks. Or, the SMEs need to train/advise the human annotators.

The best of both worlds?

It is possible to leverage the advantages of both approaches while avoiding some of the pitfalls. The most common way is to prototype a model by using rules and use it to generate training data that can be loaded into a machine learning model. IBM Watson Explorer can be used to annotate documents with rules and can export these annotated documents into a Watson Knowledge Studio instance in exactly this manner. This approach saves some annotation effort from your SMEs, who may only have a limited amount of time to dedicate to annotation.

Additionally, you can augment machine learning models with rules. If you have a pattern that absolutely must be interpreted in a certain way, rules are the only way to guarantee that you get the interpretation you want. (With machine learning, training should approach that interpretation but is never guaranteed to get there.) This is a very common pattern which lets you get the overall benefits of machine learning while avoiding very targeted mistakes.

Conclusion

There are very good reasons to use both rules-based and machine-learning-based natural language processing techniques. Each of these techniques has benefits and pitfalls. Fortunately, the techniques can be combined in a best-of-both-worlds kind of way. The approach you use depends heavily on what kind of skill and resources you have available to you and what tradeoffs you are willing to make.

6 Comments

Pingback: Demo of natural language processing with rules and machine-learning based approaches – Freedville Blog
Rob Murgai says:

March 20, 2017 at 4:17 pm

Andrew, I like the merged, best of both worlds idea. Is there a way to use the rules in Post Processing as well? As in some rules to per-process and help with the pre-annotations, and at run time use the stat model but also add a Post Processing Rules options to mitigate some of the run time issues?

1. freedvil_wp_admin says:
  
  March 21, 2017 at 2:13 am
  
  You can certainly run post-proceeding code to refine the statistical output and probably need to if you have edge cases. You can easily imagine a scenario where you trust the statistical model except when X, Y, and Z occur together – only a rule is guaranteed to catch that.
  
Pingback: Chatbot Texterkennung - Machine Learning oder Regelbasierend
Pingback: Improving simple natural language processing models with rules or machine learning – Freedville Blog
Pingback: Setting Thresholds: Or, what kind of mistakes do I want to make? – Freedville Blog

6 Comments

Leave a Comment Cancel reply