Chatbots and compound intents

Rows of card catalog drawers
Classifiers put input into a single category. But what if that input belongs to two categories?

Chatbot developers want their bots to be able to handle as many types of user inputs as possible.  One core technique in chatbots is to classify user utterances into intents.   An intent signifies want the user wants to achieve and drives the way you respond.  “I want to cancel my last order” may be #cancel_order intent and “I want to check my points balance” may be #check_balance intent.  But what happens if the user has two intents in a single utterance, say “I want to cancel my last order and verify my points balance”?  We can reasonably assume the user wants both #cancel_order and #check_balance intents and we could consider this a compound intent.  Let’s dive into this problem space further.

Training-wise classifiers like those in chatbots perform best when there are clear distinctions between target classes.  (Chatbots use classifiers where the intents are the classes)  Training on compound intents blurs distinctions between the target classes leading to poorer performance on the individuals and the compounds.  In the worst case – where all combinations are allowed – complexity increases exponentially from n classes to 2^n classes.  Chat classifiers thus work best when trained on single intent utterances.  What does this mean for compound intents?

Let’s assume a classifier well-trained on single intents.  For any utterance this classifier will return a list of candidate intents with associated confidence.  The way you detect compound intents is by “doing math” on the confidence scores.   Ideally our example “I want to cancel my last order and verify my points balance” scores highly on #cancel_order and #check_balance, and scores lowly on the other possible intents, and ideally there is a clear dividing line in the confidence scores.  The happy path for detecting compound intents has a cluster of high confidence intents cleanly separated from low confidence intents.

True positive example of compound intents

“I want to cancel my last order and verify my points balance”
#cancel_order (confidence 0.9053)
#check_balance (confidence 0.7393)
#modify_order (confidence 0.2704)
#payment_inquiry (confidence 0.2342)

Our initial hypothesis might be to return all intents over a certain confidence.  However this hypothesis can lead to false positive detection of compounds.  There can be a single intent utterance that the system is unsure about, leading to two relatively high confidence intents being clustered incorrectly incorrectly into a compound intent.  Falsely detecting a compound intent leads to a negative user experience.

False positive example of compound intents

“I want to verify my last order is cancelled”

#check_balance (confidence 0.7507)
#cancel_order (confidence 0.7273)
#modify_order (confidence 0.2739)
#payment_inquiry (confidence 0.2340)

In this case the user only wants to check the balance, however their utterance is closely classified against order cancellation as well.  The correct intent is still identified, but if we are trying to detect compound intents via confidence values, this looks even more like a compound intent than our first example!

Classification technology is wonderful at quickly sorting large numbers of data inputs into buckets via training on a representative small subset.    This post explores a boundary condition that chatbot classifiers struggle with.  From a design point of view you need to be aware of what mistakes your system can make and how you deal with them.  If your decision is to never attempt the detection of a compound intent you will live with false negatives (never detecting actual compound intents).  If you decide to attempt compound intent detection you will introduce false positives (detecting compound intents that never existed).  Both decisions affect your chatbot and your overall design.