{"id":17,"date":"2016-10-31T02:45:26","date_gmt":"2016-10-31T02:45:26","guid":{"rendered":"http:\/\/freedville.com\/blog\/?p=17"},"modified":"2016-10-31T02:46:06","modified_gmt":"2016-10-31T02:46:06","slug":"feature-engineering-lessons-learned-while-calculating-natural-language-processing-confidence","status":"publish","type":"post","link":"https:\/\/freedville.com\/blog\/2016\/10\/31\/feature-engineering-lessons-learned-while-calculating-natural-language-processing-confidence\/","title":{"rendered":"Feature engineering lessons learned while calculating natural language processing confidence"},"content":{"rendered":"<p>In my post on\u00a0<a href=\"http:\/\/freedville.com\/blog\/2016\/10\/31\/calculating-confidence-of-natural-language-processing-results-using-machine-learning\/\">Calculating confidence of natural language processing results using machine learning<\/a>\u00a0I described building a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Logistic_regression\">logistic regression<\/a> model using two simple features. \u00a0I learned a few things and made a few mistakes along the way and I&#8217;ll describe both\u00a0here.<\/p>\n<p>The first was that my <a href=\"https:\/\/en.wikipedia.org\/wiki\/Logit\">logit function<\/a> (b0 + b1*x1 + &#8230; + bn*xn, where b<em>i<\/em> are the feature coefficients and x<em>i<\/em> the feature scores) had negative coefficients. \u00a0It&#8217;s not wrong to have negative coefficients, but it does mean that the associated feature correlates <strong>negatively<\/strong>\u00a0to producing correct outcomes. \u00a0With a negative coefficient on a feature that scores 0-1, if that feature has the best possible score then the overall confidence in the result goes <strong>down<\/strong>. \u00a0In this case you can try to &#8220;fix&#8221; the feature scoring, or just simply remove the feature from consideration.<\/p>\n<p>A similar lesson comes with small coefficients. \u00a0If my logit function is -3 + 6*x1 + 0.003*x2, then this means x2 has very little predictive power. \u00a0Depending on the cost to calculate features, I may just want to drop the calculation of x2, as it can only move my prediction by a minuscule fraction.<\/p>\n<p>The final lesson is to do a sanity check on your logit function. \u00a0If you follow my advice of scaling all feature scores from 0 to 1, with 0 always being produced by worst possible input and 1 from best possible input, you should hope for the constant value (b0) to be approximately -3 and the sum of all coefficients (b0+b1+&#8230;+bn) to be approximately +3. \u00a0(These come from passing all 0s, then all 1s, respectively, to your logit function). \u00a0Logit values of -3 and +3 map to 5% and 95% confidence in the classified result (using the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sigmoid_function\">sigmoid function<\/a> to determine confidence).<\/p>\n<p>This sanity check makes intuitive sense. \u00a0If an input you have never seen before generates the worst possible feature scores, you should have very low confidence that the associated answer will be correct. \u00a05% is even optimistic here. \u00a0On the other hand, for an input which generates perfect feature scores, your predicted accuracy tops out in the high 90s percentage-wise. \u00a0Intuitively it seems wrong to give a 100% confidence to almost any prediction.<\/p>\n<p>If your logit function does <strong>not<\/strong> range from about -3 to +3, then machine learning is telling you that either the features you have selected are not good features, or that you lack training data. \u00a0In my instance I was testing up to three\u00a0features on hundreds of data rows, which is sufficient for a decent prediction. \u00a0My logit ranged from -2 to +0.4, indicating that the machine learning algorithm would never be more than 60% confident in its prediction based on my input features. \u00a0It turns out due to a coding error, I had a bug in my feature scorer and was thus generating non-predictive data. \u00a0I knew my feature scorer bug was solved when I retrained the model and the logit now ranged from -3 to +3.<\/p>\n<p>After running through these lessons, I built a model I am pretty happy with. \u00a0As is always the case with machine learning, I intend to revisit this model and retrain it once my input data significantly changes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my post on\u00a0Calculating confidence of natural language processing results using machine learning\u00a0I described building a logistic regression model using two simple features. \u00a0I learned a few things and made a few mistakes along the&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[3,2],"_links":{"self":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/17"}],"collection":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/comments?post=17"}],"version-history":[{"count":1,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/17\/revisions"}],"predecessor-version":[{"id":18,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/17\/revisions\/18"}],"wp:attachment":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/media?parent=17"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/categories?post=17"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/tags?post=17"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}