{"id":177,"date":"2017-07-18T02:01:32","date_gmt":"2017-07-18T02:01:32","guid":{"rendered":"http:\/\/freedville.com\/blog\/?p=177"},"modified":"2019-04-12T14:12:46","modified_gmt":"2019-04-12T14:12:46","slug":"balancing-precision-and-recall-in-your-cognitive-system","status":"publish","type":"post","link":"http:\/\/freedville.com\/blog\/2017\/07\/18\/balancing-precision-and-recall-in-your-cognitive-system\/","title":{"rendered":"Balancing precision and recall in your cognitive system"},"content":{"rendered":"<p>In building a cognitive model it can be difficult to know <a href=\"http:\/\/freedville.com\/blog\/2017\/02\/04\/reaching-peak-cognitive-performance-are-we-there-yet\/\">when it&#8217;s accurate enough<\/a>. &nbsp;(You are measuring your daily performance, right?) &nbsp;We know that cognitive performance often involves making a tradeoff between the competing factors of precision and recall. &nbsp;In this post I&#8217;ll explore some strategies for dealing with the tradeoffs.<\/p>\n<p><strong>Strategy 1: Use different F-scores<\/strong><\/p>\n<p>In most of my posts I describe using F1, but you can use any <a href=\"https:\/\/en.wikipedia.org\/wiki\/F1_score\">F-score<\/a>. &nbsp;F1 gives equal weight to precision and recall. &nbsp;Depending on your application certain kinds of mistakes are worse than others and a different F-score is appropriate. &nbsp;You can plug other constants into the F-score formula to favor precision or recall.<\/p>\n<p>If you are building an application&nbsp;where not surfacing a potential correct answer is a big deal (for instance, a search application), you should favor recall by using F2. &nbsp;This punishes false negatives twice as much as false positives. &nbsp;If the worse thing your application can do is produce a false positive, favor precision by using F0.5. &nbsp;If your users demand an even more extreme balance, ratchet up (or down) that F-score coefficient as needed.<\/p>\n<p>You&nbsp;get what you measure, so it is important to decide the metric that measures success.<\/p>\n<p><strong>Strategy 2: Use multiple layers with different optimizations<\/strong><\/p>\n<p>This is an approach popularized by the <a href=\"https:\/\/www.aaai.org\/Magazine\/Watson\/watson.php\">Jeopardy!-playing Watson system<\/a>. &nbsp;The first layer in that system favored recall, generating multiple hypotheses that were sent to a second layer that merged and ranked them. &nbsp;This second layer favored precision, keeping only the highest ranked answer (and only if it cleared a confidence threshold).<\/p>\n<p>This approach is adaptable to other cognitive approaches. &nbsp;Let&#8217;s assume we have a Product Color annotator that runs on product descriptions and has a 60% F1 score.<\/p>\n<h2 id=\"tablepress-6-name\" class=\"tablepress-table-name tablepress-table-name-id-6\">Example resolution layer with majority voting<\/h2>\n\n<table id=\"tablepress-6\" class=\"tablepress tablepress-id-6\" aria-labelledby=\"tablepress-6-name\">\n<thead>\n<tr class=\"row-1 odd\">\n\t<th class=\"column-1\">Product #<\/th><th class=\"column-2\">Annotation layer results<\/th><th class=\"column-3\">Voting layer result<\/th><th class=\"column-4\">Actual color<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-hover\">\n<tr class=\"row-2 even\">\n\t<td class=\"column-1\">Product 1<\/td><td class=\"column-2\">green, green, purple<\/td><td class=\"column-3\">green<\/td><td class=\"column-4\">green<\/td>\n<\/tr>\n<tr class=\"row-3 odd\">\n\t<td class=\"column-1\">Product 2<\/td><td class=\"column-2\">red, red, blue, red<\/td><td class=\"column-3\">red<\/td><td class=\"column-4\">red<\/td>\n<\/tr>\n<tr class=\"row-4 even\">\n\t<td class=\"column-1\">Product 3<\/td><td class=\"column-2\">yellow, orange<\/td><td class=\"column-3\">-<\/td><td class=\"column-4\">yellow<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-6 from cache -->\n<p>We can improve our detection of colors using a resolution layer, in this case simple majority voting. &nbsp;Many other resolution techniques are possible. &nbsp;The resolution layer allows the NLP layer to be a little looser, favoring recall, and tightening (favoring precision) can happen later.<\/p>\n<p><strong>Strategy 3: Build a dynamic confidence model<\/strong><\/p>\n<p>Rather than using a one-size-fits-all method to balance precision and recall, you can adapt the balance between them on a case-by-case basis. &nbsp;If you are using rules-based NLP, you could&nbsp;measure the precision of each of your rules and let the best rules &#8216;overrule&#8217; the weakest. &nbsp;You could examine document metadata and decide to trust (or be suspicious of) results from documents of given type or age.<\/p>\n<p>The model should be tuned from experience, or better yet actual results, to improve the chances of imperfect tools giving quality results.<\/p>\n<p>This strategy&nbsp;is a variation of Strategy 2 using a much fancier layer. &nbsp;For additional ideas,&nbsp;see my previous post&nbsp;about <a href=\"http:\/\/freedville.com\/blog\/2016\/10\/31\/calculating-confidence-of-natural-language-processing-results-using-machine-learning\/\">building an NLP confidence model<\/a>.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>There are multiple strategies you can use to balance precision and recall in your cognitive system. &nbsp;First decide what&nbsp;types of errors&nbsp;are most harmful to your system, then use these strategies to balance precision and recall for optimal results.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In building a cognitive model it can be difficult to know when it&#8217;s accurate enough. &nbsp;(You are measuring your daily performance, right?) &nbsp;We know that cognitive performance often involves making a tradeoff between the competing&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[9,2,4,6],"_links":{"self":[{"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/177"}],"collection":[{"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/comments?post=177"}],"version-history":[{"count":4,"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/177\/revisions"}],"predecessor-version":[{"id":417,"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/177\/revisions\/417"}],"wp:attachment":[{"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/media?parent=177"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/categories?post=177"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/tags?post=177"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}