{"id":107,"date":"2017-01-25T14:18:15","date_gmt":"2017-01-25T14:18:15","guid":{"rendered":"http:\/\/freedville.com\/blog\/?p=107"},"modified":"2019-04-12T14:12:23","modified_gmt":"2019-04-12T14:12:23","slug":"comparing-rules-and-machine-learning-natural-language-processing-approaches","status":"publish","type":"post","link":"https:\/\/freedville.com\/blog\/2017\/01\/25\/comparing-rules-and-machine-learning-natural-language-processing-approaches\/","title":{"rendered":"Comparing rules and machine learning natural language processing approaches"},"content":{"rendered":"<p><strong>Introduction<\/strong><\/p>\n<p>In my past posts, I showed both how to <a href=\"http:\/\/freedville.com\/blog\/2017\/01\/13\/demo-of-natural-language-processing-with-rules-and-machine-learning-based-approaches\/\">create a simple NLP model with rules and machine learning<\/a>, and how to <a href=\"http:\/\/freedville.com\/blog\/2017\/01\/20\/improving-simple-natural-language-processing-models-with-rules-or-machine-learning\/\">improve those same rules and machine learning NLP models<\/a>. &nbsp;In this post I will compare the two approaches and give advice for when each approach is appropriate.<\/p>\n<p><strong>Overall comparison<\/strong><\/p>\n<p>The following table lays out some key points to consider when picking a natural language processing approach.<\/p>\n\n<table id=\"tablepress-5\" class=\"tablepress tablepress-id-5\">\n<thead>\n<tr class=\"row-1 odd\">\n\t<th class=\"column-1\"><\/th><th class=\"column-2\">Rules-based<\/th><th class=\"column-3\">Machine-learning-based<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-hover\">\n<tr class=\"row-2 even\">\n\t<td class=\"column-1\">What people and skills are needed to be successful?<\/td><td class=\"column-2\">Expert rules developers to write rules.<br \/>\nSubject matter experts to advise the rules writers.<\/td><td class=\"column-3\">Subject matter experts will annotate all the documents.<br \/>\nA chief annotator will have to advise\/manage the subject matter experts<\/td>\n<\/tr>\n<tr class=\"row-3 odd\">\n\t<td class=\"column-1\">How much effort is required for a prototype?<\/td><td class=\"column-2\">Little - a skilled rule writer can very quickly generate some results with one or two rules.<\/td><td class=\"column-3\">There is an up-front cost to train the SMEs in the use of the training tool.  Afterwards, initial results could come within hours.<\/td>\n<\/tr>\n<tr class=\"row-4 even\">\n\t<td class=\"column-1\">How is accuracy improved?<\/td><td class=\"column-2\">By writing more\/better rules.<\/td><td class=\"column-3\">By changing the training data, usually by adding more.<\/td>\n<\/tr>\n<tr class=\"row-5 odd\">\n\t<td class=\"column-1\">If the model makes a mistake, how can it be fixed?<\/td><td class=\"column-2\">Write a new rule.  This increases the complexity of the model. It's possible the rule will not generalize and will only fix one case.<\/td><td class=\"column-3\">Add the sentence\/document with the mistake to the training data.  There is no guarantee that the model will get this sentence\/document right the next time, or the model may make a mistake somewhere else.  Generally, the model improves overall with more training data.<\/td>\n<\/tr>\n<tr class=\"row-6 even\">\n\t<td class=\"column-1\">When does accuracy improvement reach diminishing returns?<\/td><td class=\"column-2\">When the number and complexity of rules becomes difficult to maintain.<\/td><td class=\"column-3\">When adding new training documents fails to improve accuracy.<\/td>\n<\/tr>\n<tr class=\"row-7 odd\">\n\t<td class=\"column-1\">How much subject matter expert (SME) time is required?<\/td><td class=\"column-2\">Some, the SMEs should periodically advise the rules developers.<\/td><td class=\"column-3\">A lot, ideally the SMEs are doing all of the annotation tasks. Or, the SMEs need to train\/advise the human annotators.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-5 from cache -->\n<p><strong>The best of both worlds?<\/strong><\/p>\n<p>It is possible to leverage the advantages of both approaches while avoiding some of the pitfalls. &nbsp;The most common way is to prototype a model by using rules and use it to generate training data that can be loaded into a machine learning model. &nbsp;IBM Watson Explorer can be used to annotate documents with rules and can export these annotated documents into a Watson Knowledge Studio instance in exactly this manner. &nbsp;This approach saves some annotation effort from your SMEs, who may only have a limited amount of time to dedicate to annotation.<\/p>\n<p>Additionally, you can augment machine learning models with rules. &nbsp;If you have a pattern that absolutely must be interpreted in a certain way, rules are the only way to guarantee that you get the interpretation you want. &nbsp;(With machine learning, training should approach that interpretation but is never guaranteed to get there.) &nbsp;This is a very common pattern which lets you get the overall benefits of machine learning while avoiding very targeted mistakes.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>There are very good reasons to use both rules-based and machine-learning-based natural language processing techniques. &nbsp;Each of these techniques has benefits and pitfalls. &nbsp;Fortunately, the techniques can be combined in a best-of-both-worlds kind of way. &nbsp;The approach you use depends heavily on what kind of skill and resources you have available to you and what tradeoffs you are willing to make.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction In my past posts, I showed both how to create a simple NLP model with rules and machine learning, and how to improve those same rules and machine learning NLP models. &nbsp;In this post&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[9,2,4],"_links":{"self":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/107"}],"collection":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/comments?post=107"}],"version-history":[{"count":10,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/107\/revisions"}],"predecessor-version":[{"id":421,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/107\/revisions\/421"}],"wp:attachment":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/media?parent=107"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/categories?post=107"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/tags?post=107"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}