{"id":7,"date":"2016-10-31T02:16:47","date_gmt":"2016-10-31T02:16:47","guid":{"rendered":"http:\/\/freedville.com\/blog\/?p=7"},"modified":"2016-10-31T02:46:38","modified_gmt":"2016-10-31T02:46:38","slug":"calculating-confidence-of-natural-language-processing-results-using-machine-learning","status":"publish","type":"post","link":"https:\/\/freedville.com\/blog\/2016\/10\/31\/calculating-confidence-of-natural-language-processing-results-using-machine-learning\/","title":{"rendered":"Calculating confidence of natural language processing results using machine learning"},"content":{"rendered":"<p>I&#8217;ve been building a set of natural language processing (NLP) annotators and have frequently been asked about my confidence in the results. \u00a0This is an interesting question with a simple answer and a more complex (though probably more interesting answer).<\/p>\n<p>The simple answer for &#8220;what confidence should I have in an annotator&#8221; is &#8220;the <strong>precision<\/strong> of that annotator&#8221;. \u00a0(You are measuring precision right?) \u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Precision_and_recall\">Precision<\/a> is the percentage of correct answers surfaced from your annotator divided by the total number of answers from your annotator. \u00a0If an\u00a0annotator gave 4\u00a0answers and 3\u00a0are correct, its\u00a0precision is 75%. \u00a0Thus for any give answer it provides, without knowing anything else, we should have 75% confidence that the answer is correct.<\/p>\n<h2 id=\"tablepress-1-name\" class=\"tablepress-table-name tablepress-table-name-id-1\">Measuring precision of two sample annotators<\/h2>\n\n<table id=\"tablepress-1\" class=\"tablepress tablepress-id-1\" aria-labelledby=\"tablepress-1-name\">\n<thead>\n<tr class=\"row-1 odd\">\n\t<th class=\"column-1\">Annotator #<\/th><th class=\"column-2\">Correct Value<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-hover\">\n<tr class=\"row-2 even\">\n\t<td class=\"column-1\">1<\/td><td class=\"column-2\">Yes<\/td>\n<\/tr>\n<tr class=\"row-3 odd\">\n\t<td class=\"column-1\">1<\/td><td class=\"column-2\">Yes<\/td>\n<\/tr>\n<tr class=\"row-4 even\">\n\t<td class=\"column-1\">1<\/td><td class=\"column-2\">Yes<\/td>\n<\/tr>\n<tr class=\"row-5 odd\">\n\t<td class=\"column-1\">1<\/td><td class=\"column-2\">No<\/td>\n<\/tr>\n<tr class=\"row-6 even\">\n\t<td class=\"column-1\">2<\/td><td class=\"column-2\">Yes<\/td>\n<\/tr>\n<tr class=\"row-7 odd\">\n\t<td class=\"column-1\">2<\/td><td class=\"column-2\">No<\/td>\n<\/tr>\n<tr class=\"row-8 even\">\n\t<td class=\"column-1\">2<\/td><td class=\"column-2\">No<\/td>\n<\/tr>\n<tr class=\"row-9 odd\">\n\t<td class=\"column-1\">2<\/td><td class=\"column-2\">No<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-1 from cache -->\n<p>If you&#8217;re in an 80\/20 situation, you can stop here. \u00a0The precision for an annotator is often the most predictive factor in whether a given answer from that annotator is correct, and it costs very little to generate this confidence. \u00a0But let&#8217;s suppose you have a burning desire to improve on this.<\/p>\n<p>We will use a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Machine_learning\">machine learning<\/a> algorithm called &#8220;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Logistic_regression\">logistic regression<\/a>&#8221; to help find the confidence. \u00a0In short, we will come up with a list of variables that we think contribute to whether or not an answer is correct. \u00a0In machine learning parlance, these variables are called &#8220;features&#8221;. \u00a0The process of selecting which features to use is called &#8220;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Feature_engineering\">feature engineering<\/a>&#8220;, and it is part art\/part science, and you can spend a lot of time there. \u00a0For our example, let us build a very simple model using only two features.<\/p>\n<p>The first feature, as you might have guessed, is the precision of an annotator. \u00a0Let&#8217;s assume we have a hypothesis that for our problem domain, correct answers are found earlier in the text rather than later in the text. \u00a0Thus our second feature is &#8220;position\u00a0within document (relative to size of document). \u00a0If the annotation occurs on the 5th word of 100 in a document, we call that 0.95 ((100-5)\/100).<\/p>\n<p>Quick note on features: several machine learning algorithms work best if you features are all on the same scale. \u00a00-1 is a common scale. \u00a0Additionally, I like to have my features produce 0 in the worst case and 1 in the best case.<\/p>\n<p>Build a CSV file with columns for each feature and a column for the outcome. \u00a0The outcome column is 0 if the outcome of the features was wrong, and 1 if the outcome was right. \u00a0Logistic regression is thus a prediction of whether the input feature scores are more likely to produce incorrect or correct output.<\/p>\n<h2 id=\"tablepress-2-name\" class=\"tablepress-table-name tablepress-table-name-id-2\">Measuring confidence of annotators using logistic regression - sample data<\/h2>\n\n<table id=\"tablepress-2\" class=\"tablepress tablepress-id-2\" aria-labelledby=\"tablepress-2-name\">\n<thead>\n<tr class=\"row-1 odd\">\n\t<th class=\"column-1\">Precision<\/th><th class=\"column-2\">Position score<\/th><th class=\"column-3\">Correct<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-hover\">\n<tr class=\"row-2 even\">\n\t<td class=\"column-1\">0.75<\/td><td class=\"column-2\">0.95<\/td><td class=\"column-3\">1<\/td>\n<\/tr>\n<tr class=\"row-3 odd\">\n\t<td class=\"column-1\">0.75<\/td><td class=\"column-2\">0.4<\/td><td class=\"column-3\">1<\/td>\n<\/tr>\n<tr class=\"row-4 even\">\n\t<td class=\"column-1\">0.75<\/td><td class=\"column-2\">0.5<\/td><td class=\"column-3\">1<\/td>\n<\/tr>\n<tr class=\"row-5 odd\">\n\t<td class=\"column-1\">0.75<\/td><td class=\"column-2\">0.35<\/td><td class=\"column-3\">0<\/td>\n<\/tr>\n<tr class=\"row-6 even\">\n\t<td class=\"column-1\">0.25<\/td><td class=\"column-2\">0.7<\/td><td class=\"column-3\">1<\/td>\n<\/tr>\n<tr class=\"row-7 odd\">\n\t<td class=\"column-1\">0.25<\/td><td class=\"column-2\">0.5<\/td><td class=\"column-3\">0<\/td>\n<\/tr>\n<tr class=\"row-8 even\">\n\t<td class=\"column-1\">0.25<\/td><td class=\"column-2\">0.9<\/td><td class=\"column-3\">0<\/td>\n<\/tr>\n<tr class=\"row-9 odd\">\n\t<td class=\"column-1\">0.25<\/td><td class=\"column-2\">0.25<\/td><td class=\"column-3\">0<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-2 from cache -->\n<p>For my sample data, the logistic regression produced a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Logit\">logit function<\/a>\u00a0of\u00a0-3.46 + 4.06*F1 + 2.57*F2, where F1 is the annotator precision and F2 is the position within document score. \u00a0We then use the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sigmoid_function\">sigmoid function<\/a>\u00a0(1\/(1+e^(-logit))) to determine the confidence. \u00a0For instance where our 75% annotator found an answer in the 5th word, our logit is -3.46 + 4.06*.75 + 2.57*.95 = 2.03 with sigmoid (confidence) 1\/(1+e^(-2.03)) = 88.3%. \u00a0When the answer was found in the midpoint of the document, the confidence drops to 70.4%.<\/p>\n<p>Thus in this case, the machine learning algorithm validated our assumption. \u00a0Even though the annotator&#8217;s precision was the most significant predictor (it had the highest coefficient in our equation), a more exacting confidence value can be generated with additional features.<\/p>\n<p>It is left as an exercise to the reader what features you may need to try, but now that you know the method, experiment away!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been building a set of natural language processing (NLP) annotators and have frequently been asked about my confidence in the results. \u00a0This is an interesting question with a simple answer and a more complex&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[3,2,4],"_links":{"self":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/7"}],"collection":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/comments?post=7"}],"version-history":[{"count":3,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/7\/revisions"}],"predecessor-version":[{"id":15,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/7\/revisions\/15"}],"wp:attachment":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/media?parent=7"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/categories?post=7"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/tags?post=7"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}