{"id":34,"date":"2016-11-12T01:35:13","date_gmt":"2016-11-12T01:35:13","guid":{"rendered":"http:\/\/freedville.com\/blog\/?p=34"},"modified":"2019-04-12T14:14:10","modified_gmt":"2019-04-12T14:14:10","slug":"training-a-personal-chatbot-with-watson-developer-cloud-apis-part-2","status":"publish","type":"post","link":"https:\/\/freedville.com\/blog\/2016\/11\/12\/training-a-personal-chatbot-with-watson-developer-cloud-apis-part-2\/","title":{"rendered":"Training a personal chatbot with Watson Developer Cloud APIs &#8211; Part 2"},"content":{"rendered":"<p>In part 1 of&nbsp;<a href=\"http:\/\/freedville.com\/blog\/2016\/11\/09\/training-a-personal-chatbot-with-watson-developer-cloud-apis\/\">Training a personal chatbot with Watson Developer Cloud APIs<\/a> I gave an intro to text classification and how you could use it to create a chat bot that screened your incoming questions on the basis of whether or not they were related to lunch.<\/p>\n<p><strong>Covering more topics with classification<\/strong><\/p>\n<p>My first classification scheme was relatively simple &#8211; &#8220;lunch or not lunch?&#8221; &nbsp;This was a very coarse classification scheme but still presented challenges in gathering useful training data. &nbsp;The end result is not quite useful enough for a chat bot &#8211; I can hardly give an interesting response to a question knowing only that it pertains to lunch. &nbsp;The two most important parts about lunch are &#8220;when&#8221; and &#8220;where&#8221;. &nbsp;Let&#8217;s consider the possibilities.<\/p>\n<p>Given a question about lunch, it could be a &#8220;when should we go&#8221; question, &#8220;where should we go&#8221; question, &#8220;when <em>and<\/em> where should we go&#8221; question, or an &#8220;other&#8221; lunch question. &nbsp;Thus, I could reasonably sub-divide my &#8220;lunch&#8221; classification into four sub-classifications.<\/p>\n<p>Alternatively, I could think about classifying in orthogonal dimensions. &nbsp;My first dimension could be &#8220;lunch vs other&#8221;, the second dimension could be &#8220;when vs not_when&#8221;, the third dimension could be &#8220;where vs not_where&#8221;. &nbsp;(Or, my second dimension could be &#8220;when vs where vs neither&#8221;). &nbsp;Then, I could classify a piece of text against both dimensions simultaneously and take the intersection. &nbsp;If the first classifier says it&#8217;s a &#8220;lunch&#8221; question, the second says it&#8217;s a &#8220;when&#8221; question, then the intersection means it&#8217;s a &#8220;when should we lunch&#8221; question.<\/p>\n<p><strong>General or specific classifiers &#8211; which are better?<\/strong><\/p>\n<p>Should you prefer very specific classifiers on a single dimension, or more general classifiers on multiple dimensions? &nbsp;A classic &#8220;it depends&#8221; question! &nbsp;You should be able to get good results with either approach, if you have sufficient training data. &nbsp;(This is a big if, we&#8217;ll come back to it.) &nbsp;Specific classifiers should in theory be extra precise, if trained well. &nbsp;There is a significant benefit to the &#8216;general&#8217; classifiers method however. &nbsp;Consider how many topics you want your chat bot to handle. &nbsp;Let&#8217;s say you will want your chat bot to not just handle lunch arrangements but also regular business meetings. &nbsp;Business meetings will also include where and when aspects. &nbsp;If you use &#8216;general&#8217; classifiers, you have the opportunity to reuse existing classifiers and will have less training to do overall.<\/p>\n<p><strong>What does the data allow?<\/strong><\/p>\n<p>The philosophical questions sometimes have to take a backseat to what the data allows. &nbsp;Recall that my initial possible training data set is 66,000 questions pulled from my chat transcripts. &nbsp;Let&#8217;s see what the data allows in terms of lunch-related questions. &nbsp;For a shortcut, I will run a grep for the term lunch and the seven restaurants my team visits the most:<\/p>\n<p><code>egrep -i \u201clunch|carmen|moe|lime|greek|brixx|randys|kabob\" SametimeQuestions.txt<\/code><\/p>\n<p>This yields approximately 100 questions. &nbsp;Through a quick scan, only 9 appear to be &#8220;when&#8221; questions. &nbsp;A similar number are &#8220;where&#8221; questions. &nbsp;(The vast majority are simply &#8220;interested in lunch?&#8221; questions or not even related to a lunch invitation). &nbsp;9 examples is not going to be enough for classifying, I don&#8217;t want to think about using a classifier that doesn&#8217;t have at least 20 examples, and I don&#8217;t want to generate synthetic examples since I know they won&#8217;t be representative of questions I will actually receive. &nbsp;Thus the data appears to be forcing our hand.<\/p>\n<p>It is worth verifying, do we have enough when and where questions to classify? I again use a simple grep as a proxy for estimating possible training data. Most invitations, lunch or otherwise, happen on 15-minute intervals, so checking the &#8220;minutes&#8221; value of the clock plus the &#8220;when&#8221; keyword is a good proxy for how many &#8220;when&#8221; questions I will have. For &#8220;where&#8221; questions, I use the obvious keyword plus a handful of common locations.<\/p>\n<p><code>egrep -i \"00|15|30|45|when\" SametimeQuestions.txt<br \/>\negrep -i \"where|carmen|moe|lime|greek|brixx|randys|kabob\" SametimeQuestions.txt | wc -l<\/code><\/p>\n<p>These queries yield approximately 800 and 500 questions, respectively. Thus, I should have plenty of training data available to build multiple, generalized classifiers.<\/p>\n<p><strong>Training data for two dimensions<\/strong><\/p>\n<p>At a high level, the training exercise looks a lot like what we did in the previous post, we just have more work to do since we are doing more dimensions. &nbsp;Just grab questions and manually classify them. &nbsp;To train against &#8220;when&#8221;\/&#8221;not when&#8221;, I added 70 questions from my &#8220;00|15|30|45|when&#8221; question set, being sure to include questions that used times or the word &#8220;when&#8221; in a way that did NOT indicate a when question.<\/p>\n<p>I added these questions to the same spreadsheet as my first training data, simply adding a column to represent when\/not-when. &nbsp;Thus my columns are question, lunch\/other, when\/not-when. &nbsp;Note that this requires additional classification, adding when\/not-when to my original lunch\/other questions, and adding lunch\/other to the new when\/not-when questions. &nbsp;However this is largely an Excel &#8220;fill&#8221; exercise &#8211; most of the original questions were not-when, and most of the new questions are other (not-lunch).<\/p>\n<p>Here&#8217;s a sample of the new training data (full data here: <a href=\"http:\/\/freedville.com\/blog\/wp-content\/uploads\/2016\/11\/AndrewChatsNLCTraining_2d.csv\">Lunch vs Other and When vs Not-When NLC Training Data<\/a>):<br \/>\n<code>lunch @ 1130?,lunch,when<br \/>\nare you going out for lunch today?,lunch,not_when<br \/>\nAndrew- u have extra shinguard?,other,not_when<br \/>\nlet's chat after I grab some lunch .. you have some time this afternoon?,other,when<br \/>\ndo we have a room for the 1:30?,other,not_when<br \/>\ndoes this happen when you build and deploy?,other,not_when<br \/>\nwhen is your next meeting?,other,when<\/code><br \/>\nThere are a few possibilities for how to proceed, depending on whether we want to use multiple Natural Language Classifier instances (one per dimension) or whether we want to &#8220;hack&#8221; Natural Language Classifier by forcing it to run multi-dimensional classification in a single instance.<\/p>\n<p><strong>Using two classifiers<\/strong><\/p>\n<p>The first method is to use two classifiers. &nbsp;I can take my 2d training file and alternately delete the &#8220;lunch&#8221; or &#8220;when&#8221; column, to create two training files, each of one dimension. &nbsp;See&nbsp;<a href=\"http:\/\/freedville.com\/blog\/wp-content\/uploads\/2016\/11\/AndrewChatsNLCTraining_when.csv\">When vs Not-When NLC Training Data<\/a>.<\/p>\n<p>After training my &#8220;when&#8221; classifier, I ask it the same questions I asked in my <a href=\"http:\/\/freedville.com\/blog\/2016\/11\/09\/training-a-personal-chatbot-with-watson-developer-cloud-apis\/\">previous post<\/a>. &nbsp;The &#8220;lunch @ 1145&#8221; question classifies as 86.5% confident to &#8220;when&#8221;. &nbsp;Thus my two classifiers tell me with very high confidence that this is a &#8220;when lunch&#8221; question. &nbsp;&#8220;sure, when do you need it by?&#8221; is 99.5% confidence for &#8220;when&#8221; and 99.4% &#8220;other&#8221;, clearly a &#8220;when not-lunch&#8221; question. &nbsp;&#8220;does it work when you restart the build?&#8221; is 84.5% confidence for &#8220;not-when&#8221;.<\/p>\n<p>From these results you can see that the &#8220;when\/not-when&#8221; classifier performs very well, even better than the lunch\/other classifier described in my previous post. &nbsp;This is not surprising when you consider it received twice as much training data.<\/p>\n<p>Thus, by using two classifiers and combining their results, you can get high confidence on each dimension and make a very good guess at a question&#8217;s true two-dimensional classification.<\/p>\n<p><strong>Multiple dimensions in one classifier<\/strong><\/p>\n<p>I can upload my two-dimensional training data directly into a single classifier. &nbsp;When I run a question against it, I get interesting results:<\/p>\n<p><code>curl -G -u \"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\":\"xxxxxxxxxxxx\" https:\/\/gateway.watsonplatform.net\/natural-language-classifier\/api\/v1\/classifiers\/xxxxxxxxxx-nlc-xxxx\/classify --data-urlencode \"text=lunch @ 1145?\"<br \/>\n{<br \/>\n\"classifier_id\" : \"xxxxxxxxxx-nlc-xxxx\",<br \/>\n\"url\" : \"https:\/\/gateway.watsonplatform.net\/natural-language-classifier\/api\/v1\/classifiers\/8aff06x106-nlc-13700\",<br \/>\n\"text\" : \"lunch @ 1145?\",<br \/>\n\"top_class\" : \"lunch\",<br \/>\n\"classes\" : [ {<br \/>\n\"class_name\" : \"lunch\",<br \/>\n\"confidence\" : 0.8798495105446424<br \/>\n}, {<br \/>\n\"class_name\" : \"when\",<br \/>\n\"confidence\" : 0.06570656619039483<br \/>\n}, {<br \/>\n\"class_name\" : \"not_when\",<br \/>\n\"confidence\" : 0.04098451924057577<br \/>\n}, {<br \/>\n\"class_name\" : \"other\",<br \/>\n\"confidence\" : 0.013459404024386896<br \/>\n}<\/code><code> ]<br \/>\n}<\/code><\/p>\n<p>We can view this question&nbsp;as a &#8220;lunch&#8221; and &#8220;when&#8221; question because they are the highest confidence classifications. &nbsp;But we lose significant fidelity against one of our dimensions, since the classifier tries to pick a primary classification.<\/p>\n\n<table id=\"tablepress-4\" class=\"tablepress tablepress-id-4\">\n<thead>\n<tr class=\"row-1 odd\">\n\t<th class=\"column-1\">Question<\/th><th class=\"column-2\">Lunch confidence<\/th><th class=\"column-3\">Other confidence<\/th><th class=\"column-4\">When confidence<\/th><th class=\"column-5\">Not-when confidence<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-hover\">\n<tr class=\"row-2 even\">\n\t<td class=\"column-1\">sure, when do you need it by?<\/td><td class=\"column-2\">96.5%<\/td><td class=\"column-3\">0.3%<\/td><td class=\"column-4\">2.78%<\/td><td class=\"column-5\">0.5%<\/td>\n<\/tr>\n<tr class=\"row-3 odd\">\n\t<td class=\"column-1\">talk after lunch?<\/td><td class=\"column-2\">64.0%<\/td><td class=\"column-3\">10.6%<\/td><td class=\"column-4\">22.0%<\/td><td class=\"column-5\">3.5%<\/td>\n<\/tr>\n<tr class=\"row-4 even\">\n\t<td class=\"column-1\">o'charleys?<\/td><td class=\"column-2\">7.5%<\/td><td class=\"column-3\">2.8%<\/td><td class=\"column-4\">88.2%<\/td><td class=\"column-5\">1.5%<\/td>\n<\/tr>\n<tr class=\"row-5 odd\">\n\t<td class=\"column-1\">production?<\/td><td class=\"column-2\">4.2%<\/td><td class=\"column-3\">4.1%<\/td><td class=\"column-4\">1.3%<\/td><td class=\"column-5\">90.2%<\/td>\n<\/tr>\n<tr class=\"row-6 even\">\n\t<td class=\"column-1\">are we going to lunch today?<\/td><td class=\"column-2\">90.4%<\/td><td class=\"column-3\">0.8%<\/td><td class=\"column-4\">0.7%<\/td><td class=\"column-5\">8.1%<\/td>\n<\/tr>\n<tr class=\"row-7 odd\">\n\t<td class=\"column-1\">does it work when you restart the build?<\/td><td class=\"column-2\">0.2%<\/td><td class=\"column-3\">97.8%<\/td><td class=\"column-4\">0.8%<\/td><td class=\"column-5\">1.1%<\/td>\n<\/tr>\n<tr class=\"row-8 even\">\n\t<td class=\"column-1\">what time is lunch?<\/td><td class=\"column-2\">3.4%<\/td><td class=\"column-3\">6.1%<\/td><td class=\"column-4\">88.7%<\/td><td class=\"column-5\">1.7%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-4 from cache -->\n<p>We can see this in the &#8220;O&#8217;Charley&#8217;s&#8221; (a restaurant we visit) and &#8220;production&#8221; questions. &nbsp;The highest lunch\/other classification for both is &#8220;lunch&#8221;. &nbsp;But how confident can we feel in the result? &nbsp;The problem is that all classifier confidences must add up to 100%, and the highest-matching classification takes up too much of the confidence pie to make a great delineation on the second dimension. &nbsp;If we had tried to force three dimensions of classification into this classifier, the additional dimensions would be even harder to tease out of the classifier.<\/p>\n<p><strong>How many classifiers is best?<\/strong><\/p>\n<p>Simple classifiers can be called in parallel and their results can be combined in an intuitive way. &nbsp;One classifier can be overloaded and produce multiple dimensions. &nbsp;Overloading one classifier may simplify classifier management, and it surely cuts down on our API usage bill, but it makes it harder to get a true read of the confidence on all of the classification dimensions. &nbsp;Your mileage may vary, but I prefer the higher fidelity of using multiple&nbsp;classifiers.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>You can classify text against multiple dimensions, assuming those dimensions are orthogonal. &nbsp;This can be accomplished using specific multi-dimensional classifications (ex: when_lunch) or multiple, general classifications (ex: when, lunch). &nbsp;Multiple, general classifications can be achieved with a single or multiple classifiers. &nbsp;Experimentation will help you decide what classification techniques your data requires.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In part 1 of&nbsp;Training a personal chatbot with Watson Developer Cloud APIs I gave an intro to text classification and how you could use it to create a chat bot that screened your incoming questions&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[9,5,4],"_links":{"self":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/34"}],"collection":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/comments?post=34"}],"version-history":[{"count":8,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/34\/revisions"}],"predecessor-version":[{"id":426,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/34\/revisions\/426"}],"wp:attachment":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/media?parent=34"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/categories?post=34"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/tags?post=34"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}