Chatbots and compound intents

Rows of card catalog drawers
Classifiers put input into a single category. But what if that input belongs to two categories?

Chatbot developers want their bots to be able to handle as many types of user inputs as possible.  One core technique in chatbots is to classify user utterances into intents.   An intent signifies want the user wants to achieve and drives the way you respond.  “I want to cancel my last order” may be #cancel_order intent and “I want to check my points balance” may be #check_balance intent.  But what happens if the user has two intents in a single utterance, say “I want to cancel my last order and verify my points balance”?  We can reasonably assume the user wants both #cancel_order and #check_balance intents and we could consider this a compound intent.  Let’s dive into this problem space further.

Training-wise classifiers like those in chatbots perform best when there are clear distinctions between target classes.  (Chatbots use classifiers where the intents are the classes)  Training on compound intents blurs distinctions between the target classes leading to poorer performance on the individuals and the compounds.  In the worst case – where all combinations are allowed – complexity increases exponentially from n classes to 2^n classes.  Chat classifiers thus work best when trained on single intent utterances.  What does this mean for compound intents?

Let’s assume a classifier well-trained on single intents.  For any utterance this classifier will return a list of candidate intents with associated confidence.  The way you detect compound intents is by “doing math” on the confidence scores.   Ideally our example “I want to cancel my last order and verify my points balance” scores highly on #cancel_order and #check_balance, and scores lowly on the other possible intents, and ideally there is a clear dividing line in the confidence scores.  The happy path for detecting compound intents has a cluster of high confidence intents cleanly separated from low confidence intents.

True positive example of compound intents

“I want to cancel my last order and verify my points balance”
#cancel_order (confidence 0.9053)
#check_balance (confidence 0.7393)
#modify_order (confidence 0.2704)
#payment_inquiry (confidence 0.2342)

Our initial hypothesis might be to return all intents over a certain confidence.  However this hypothesis can lead to false positive detection of compounds.  There can be a single intent utterance that the system is unsure about, leading to two relatively high confidence intents being clustered incorrectly incorrectly into a compound intent.  Falsely detecting a compound intent leads to a negative user experience.

False positive example of compound intents

“I want to verify my last order is cancelled”

#check_balance (confidence 0.7507)
#cancel_order (confidence 0.7273)
#modify_order (confidence 0.2739)
#payment_inquiry (confidence 0.2340)

In this case the user only wants to check the balance, however their utterance is closely classified against order cancellation as well.  The correct intent is still identified, but if we are trying to detect compound intents via confidence values, this looks even more like a compound intent than our first example!

Classification technology is wonderful at quickly sorting large numbers of data inputs into buckets via training on a representative small subset.    This post explores a boundary condition that chatbot classifiers struggle with.  From a design point of view you need to be aware of what mistakes your system can make and how you deal with them.  If your decision is to never attempt the detection of a compound intent you will live with false negatives (never detecting actual compound intents).  If you decide to attempt compound intent detection you will introduce false positives (detecting compound intents that never existed).  Both decisions affect your chatbot and your overall design.

2 Comments

  1. Hi Andrew,

    Great post!

    Use of linguistic methods for pre-processing a sentence is challenging as well, as this pre-supposes good grammar on behalf of the user.

    Nearly every demo I have seen shows a well-formed sentence that could be parsed into a phrase-structure tree and split evenly among clauses, with well-formed triples on each side, each contributing to a logical intent. Yet many users prefer to string imperatives together with the syntactic glue that makes this linguistic analysis possible.

    It’s been a while since I have worked on a purely-linguistic solution in this space, but I recall pre-processing often made under-formed sentences easier to parse. For example, a short imperative “change the disc” (open class / closed class / open class) would cause NN vs VB confusion on “change”. Modifying a sentence like this to “please change the disc” gave enough context to the parser to use VB as the pos tag. Still not a scalable solution in the long-run, but pre-processing is one still a tactical technique for dealing with poor grammar in some cases.

    Another technique is to explore business processes behind multiple intents and use these to help nudge the probability of determining between multiple vs single intents. In your example, there may be a subject matter expert or log data to demonstrate that #cancel_order -> #check_balance is not unlikely.

    There’s no perfect solution. We’ve explored the use of badges and gamification to a subset of the user base. For these users, we permit the system more ambiguity and generate log data as a basis for training.

    Thanks again for your post – always a great read!

    Kind Regards,
    Craig Trim

Leave a Reply to Craig Trim Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.