{"id":137,"date":"2017-02-04T02:15:59","date_gmt":"2017-02-04T02:15:59","guid":{"rendered":"http:\/\/freedville.com\/blog\/?p=137"},"modified":"2019-04-12T14:11:53","modified_gmt":"2019-04-12T14:11:53","slug":"reaching-peak-cognitive-performance-are-we-there-yet","status":"publish","type":"post","link":"https:\/\/freedville.com\/blog\/2017\/02\/04\/reaching-peak-cognitive-performance-are-we-there-yet\/","title":{"rendered":"Reaching peak cognitive performance &#8211; are we there yet?"},"content":{"rendered":"<p>If you&#8217;ve built a cognitive system, you know that it requires a lot of training to reach peak performance. &nbsp;(For details, read&nbsp;<a href=\"http:\/\/freedville.com\/blog\/2016\/12\/15\/why-does-machine-learning-require-so-much-training-data\/\">Why does machine learning require so much data?<\/a>) &nbsp;How much training data do you really need? &nbsp;And when can you stop collecting training data?<\/p>\n<p>The answer is: &#8220;it depends&#8221;. &nbsp;But, you can make an educated guess by plotting the performance of your cognitive system against the size of your training set. &nbsp;When performance starts to plateau (at some point, it definitely will). &nbsp;After this point the cost of performance improvements greatly increases.<\/p>\n<p>I recently showed you <a href=\"http:\/\/freedville.com\/blog\/2017\/01\/13\/demo-of-natural-language-processing-with-rules-and-machine-learning-based-approaches\/\">how to&nbsp;an NLP model with machine learning<\/a> and <a href=\"http:\/\/freedville.com\/blog\/2017\/01\/20\/improving-simple-natural-language-processing-models-with-rules-or-machine-learning\/\">how to improve the performance of that model<\/a>. &nbsp;I continued training that model for a total of five iterations and plotted the performance as seen below:<\/p>\n<figure id=\"attachment_139\" aria-describedby=\"caption-attachment-139\" style=\"width: 700px\" class=\"wp-caption alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-139 size-large\" src=\"http:\/\/freedville.com\/blog\/wp-content\/uploads\/2017\/02\/f1_wks_model_demo-700x433.png\" alt=\"Plot of F1 score against ground truth word count. Model hits an asymptote around 85% F1 and 12,000 words.\" width=\"700\" height=\"433\" srcset=\"https:\/\/freedville.com\/blog\/wp-content\/uploads\/2017\/02\/f1_wks_model_demo-700x433.png 700w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2017\/02\/f1_wks_model_demo-300x185.png 300w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2017\/02\/f1_wks_model_demo-768x475.png 768w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2017\/02\/f1_wks_model_demo.png 1220w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><figcaption id=\"caption-attachment-139\" class=\"wp-caption-text\">F1 Score of Machine Learning NLP Model<\/figcaption><\/figure>\n<p>The absolute values on the scales are not as important as the overall shape of the curve. &nbsp;Machine learning can make rapid progress with limited amounts of training data, and can make large jumps in accuracy by going from a tiny data set to one that is merely small. &nbsp;The model above made a huge improvement (16 points on F1 scale)&nbsp;when the ground truth increased by a modest 1000 words (a single-spaced, double-sided printed page).<\/p>\n<p>The model above also plateaus at approximately 12,000 words of ground truth. &nbsp;I spent a few evenings curating 8,000 more words of ground truth and was rewarded by a tiny 0.8 point increase in F1. &nbsp;Having observed this plateauing I will declare success at 84% F1, which is a fabulous result for my&nbsp;overall level of effort.<\/p>\n<p>Depending on your use case you may decide it is worth chasing additional accuracy performance. &nbsp;In my <a href=\"http:\/\/freedville.com\/blog\/2016\/12\/04\/cognitive-system-testing-from-a-to-z\/\">cognitive systems testing series<\/a>, I talked about how to check each component of the system to improve performance, since 100% accuracy is not possible. &nbsp;Don&#8217;t limit yourself to looking at just one system, see if you can get additional information from other systems or subsystems. &nbsp;In my NLP model I only considered plain text from Wikipedia articles, but I could have decided to have created a parallel system that looked at the links\/citations within those articles. &nbsp;Or I could decide it is sufficient to identify documents\/paragraphs containing entities I care about.<\/p>\n<p>In conclusion, cognitive performance increases with additional training data until you reach a performance plateau. &nbsp;It&#8217;s a good idea to measure your performance against training data size to determine when you can stop collecting data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you&#8217;ve built a cognitive system, you know that it requires a lot of training to reach peak performance. &nbsp;(For details, read&nbsp;Why does machine learning require so much data?) &nbsp;How much training data do you&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[9,2,6],"_links":{"self":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/137"}],"collection":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/comments?post=137"}],"version-history":[{"count":6,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/137\/revisions"}],"predecessor-version":[{"id":420,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/137\/revisions\/420"}],"wp:attachment":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/media?parent=137"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/categories?post=137"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/tags?post=137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}