Chatbot Development

Intent Recognition in Chatbot Development

Intent recognition is one of the essential functions in chatbots. It labels text into categories for further action. For example, “Good morning” can be labeled with the intent of Greeting. This article shares an overview of how it works and its pros and cons.

By Paul Anton
May 26, 2021

What is intent recognition?

Intent recognition is one of the text-understanding tasks that are solved throughout dialogue processing pipelines. In order for intelligent chatbots to make sense of user input, text utterances must first undergo a series of transformations, including tokenization, POS tagging, intent recognition, and named entity recognition, among others. Intent recognition, also known as intent classification, is the natural language understanding (NLU) task that classifies user utterances in predefined intents.

In chatbot development, intent recognition bundles together words and sentences with the same context to generate useful responses. For example, two different sentences—“When is the latest date I can get a vaccination appointment?” and “Do you have a free slot today?”—can both be classified with the CheckSlot intent. This allows all inputs represented by the same intent to be processed and responded to in the same way, such as, “The next available slot is May 21 at 10 AM. Confirm to book?” By mapping free-form utterances and text into classes of intents, chatbot systems can handle the task more efficiently as a multiclass
classification problem.

In this article, we present an overview of intent recognition in chatbots, elaborating on how it is implemented, as well as discussing its advantages and disadvantages.

How is intent recognition formulated?

Intent recognition can be formulated as a multiclass text classification problem. Similar to any other classification task, solving intent recognition also requires a stepwise pipeline: data preprocessing, model training, performance evaluation, and if necessary, hyperparameter optimization.

Data preprocessing

Computers process text data differently than humans do. Text inputs need to be converted into numerical feature representations so that machine learning algorithms can easily identify patterns, thus making better classification predictions. Numerous techniques can be used for the conversion, such as Bag of Words, TF-IDF, Word2Vec, and GloVe. These techniques not only embed words into respective feature vectors, but they also perform data preprocessing, such as tokenizing, stemming, lemmatization, and removing stop words. If the inputs are speech utterances (audio files), they need to be converted to text in the first place.

A dataset must be divided into three separate parts for use in model training, validation, and testing. A training set usually has the largest data portion (>60 percent), followed by validation and test sets. Since intent recognition is a classification problem, the text inputs (or feature vectors) need to be assigned respective intents as labels. A training set with equal data proportion across all categories/intents has less data bias towards certain categories, therefore potentially achieving a better classification outcome.

Model training and evaluation

There are different algorithms available to be trained for intent recognition, ranging from classical machine learning methods, like Naïve Bayes and SVM, to neural network models, like LSTM and BERT. They have different architectures and computations, therefore generating predictions of varying accuracies. A text classifier of choice is trained with the training set, optimized with the validation set, and evaluated with the test set. The cross-validation process can be applied in this stage, repeating the steps of model training, validation, and testing with different samples of data.

Intent recognition is a supervised learning task, in which the model learns to make categorical predictions given true labels of data inputs. Performance measures, such as loss and accuracy, are reported after the model is evaluated with all three types of datasets. Based on the metrics of choice, the model learns to adjust its parameters to produce better outcomes (to minimize loss and maximize accuracy). In the end, the model with configurations that produce the best outcome is deployed in production.

Chatbot developers must not necessarily implement intent recognition using self-built architectures. Chatbot development platforms provide open source architectures, which can be trained as part of custom NLU pipelines. Such an example is Rasa’s DIETClassifier, which performs both intent recognition and named entity recognition simultaneously.

Hyperparameter optimization

The process of improving the prediction performance of machine learning models is strongly associated with trying out different hyperparameters. Hyperparameters, not to be confused with parameters, are the predefined configurations of the model before training starts. Some examples of hyperparameters are loss function, accuracy function, the number of epochs, and the model architecture. Optimizing hyperparameters is a process to find the best hyperparameters to produce the best model outcome given the dataset.

If done manually, hyperparameter optimization can be a time-consuming task. Libraries, such as GridSearchCV (scikit-learn) and Hyperopt (available in TensorFlow), can automate the process by repeating the model training and evaluation process on a given range of hyperparameters and can return the configurations of the best outcome.

The advantages and disadvantages of intent recognition

Computation-wise, there is a huge variety of algorithms and methodologies available to solve a multiclass intent classification problem. The outcome of a classification problem is also simpler to be evaluated compared to non-classification tasks, as evaluation scores can be calculated based on a binary correct/incorrect check.

By grouping words and text into intents in chatbots, conversations are more manageable and can be handled more efficiently. The NLP model is trained only on a restricted range of model behaviors (limited intents) and thus reduces uncertainty in the outcome. Anticipating a fixed number of intents, responses can be curated more accurately and can provide users with better, more relatable information.

However, determining a list of intents for model training can be challenging. A chatbot cannot categorize a text sentence with a category it is not trained with. Therefore, one must presume all possible scenarios that a chatbot might experience in a conversation and train the model with appropriate intent categories and data to get a good outcome. Also, by default, an intent classifier outputs only one prediction at a time. It cannot properly process text sentences that contain two intents, such as CheckSlot and CancelSlot—only one category will be assigned for each conversation turn. Fortunately, the “intent bottleneck” can be overcome by implementing methods like forms, actions retrieval, and multi-intents.