Chatbot Development

5 Steps for Building a Great NLP Model

If you’re about to embark on your chatbot journey, don’t worry. The steps outlined in this article will help you build a great chatbot that responds to customer requests effectively and avoids using the response, “I’m sorry, I don’t understand.”

By Benoit Alvarez
March 4, 2021

The adoption of chatbots continues to rise, but finding a high-performing bot is rare—it’s often pretty easy to identify unseen chatbot weaknesses after testing it just a couple of times.

But if you’re about to embark on your chatbot journey, don’t worry. The steps we outline in this article will help you build a great chatbot that responds to customer requests effectively and avoids using the response, “I’m sorry, I don’t understand.”

After the initial planning to define your chatbot’s purpose and goals, the work starts with mapping out your intents and entities and ends with… well, actually, there is no end, because a chatbot will need continuous monitoring, adjusting, training—and scaling, if you’re successful—throughout its life. Here are five steps to help you build a great natural language processing (NLP) model:

1. Mapping your model

You’ll need access to real customer logs, FAQs, support inboxes, or databases as a starting point. This data won’t necessarily be used as your training data but as a guide to your chatbot intents. The data will need to be divided into categories (which may not be intents at this stage), and these might also need to split into subcategories. If you can’t obtain this, you might need the help of a subject matter expert to create an initial set of customer questions.

Map each question to a category using Excel. Subcategories will then start to evolve, creating your intents and entities. For example, a chatbot for a bank might have a category for LENDING, which could then have subcategories for LOANS and MORTGAGES. And then, within the MORTGAGES, you might have further subcategories for mortgage types and so on. Here’s an example of what your intent-mapping document might look like:

As you build your map, consider how questions will be asked and if they will be asked similarly, look to group them within the same intent, and use entities for the variables, as we’ve done in the example.

Your mapping document will continue to evolve, and there may be multiple versions through the build stage.

2. Building your base model

If you’re using real customer logs, you should set aside 20 to 30 percent of them to use as cross-validation data in the future. This is essentially test data that your model will not have been trained on, and it must always be kept separate from the training data. Ensure that you have an even coverage of all intents in this dataset. The remainder of the logs can help you devise your training questions (or utterances) at this stage and can be used as a training dataset at the performance-maximization stage. (We will dive deep into this in step 3.) If you don’t have real data, you might consider asking colleagues for some example questions to use as test data (but give them limited information on each intent).

Once you’ve mapped all the main questions into categories, intents, and entities, you’re ready to build your model using your chosen NLP provider. Start with creating a small model, so prioritize your intents and add them in phases.

As a guide, aim for about 40 intents initially. Build out each intent with a set of utterances. If you have example data (i.e., real customer logs), you can refer to this to help you devise utterances, but only use it as a guide. It’s important to keep utterances short, concise, and relevant to the intent. Most NLP providers recommend at least five utterances per intent. Having 15 to 20 will give better results.

As you build out your intents, you may need to revisit your mapping document and make further changes, either by merging or separating intents, or perhaps by adding more entities.

3. Maximize your model’s performance

Once you’ve built out all your intents and added entities as appropriate, you’ll need to test and train your model, and there are various ways of doing this. Two popular methods are k-fold cross-validation testing and test-data validation, visualized by building a confusion matrix. The result of those tests will help you identify any weaknesses. K-fold is trickier to interpret, because each time you modify or add new training data, the “folds” of k-fold change. Test data is great, but you’ll have to keep changing it as you refine the mapping of your model.

There are specialist tools available to help with model testing and training, such as and Botium. These will help to make the process more efficient and improve your model’s performance. QBox helps improve accuracy and performance—in a matter of minutes—by analyzing and benchmarking your chatbot training data and giving insight into and understanding of where your chatbot does and doesn’t perform and why. You can see your chatbot’s performance at the model level, intent level, and utterance level—and even word-by-word analysis. Botium is more for test automation (a bit like Selenium), where it will run the tests that you have prepared.

Once you’ve maximized the performance of your training data, you can either add more intents (if the first 40 weren’t enough) and go back to step 2—or go live.

4. Going live and monitoring

Following a robust period of testing and training, you’ll be ready to launch your chatbot. But the work doesn’t stop at this stage—there’ll be ongoing monitoring and maintenance through additional training and retraining.

The monitoring will involve checking that the user logs are returning the correct intents with good confidence, that entities are being detected and used correctly, and that dialog flows are as expected. You might need to update responses to accommodate user demands and ensure your chatbot stays relevant, and you’ll no doubt have new intents crop up that you need to note and monitor to see if they’re popular new subject areas. As you do this, it’s good practice to update your mapping document.

Daily monitoring and training will be needed for at least the first few weeks of launch. This can then progress to weekly monitoring and training, and as your bot matures and performance is proven and stable, this could reduce to monthly training (although weekly monitoring is still advisable). Most NLP providers can extract live logs for this.

5. Adjust and scale

Once your chatbot has been stabilized and you have resolved any teething problems, you’ll be ready to adjust your bot to optimize its performance even more. Take the time to reassess any problem intent areas that might have emerged from monitoring, and merge or separate them as necessary (although you might have done this already in step 4), revisit any dialog flows that might not be running smoothly, and update content to include additional questions associated with any intent.

To scale your chatbot, go back to step 2. Make scaling a gradual process if you have a lot of new intents to add. Prioritize the intents and add them in batches of 20 to 40 at a time. The scaling phase tends to generate regression, so it’s really important to keep checking your chatbot’s performance.


The key to building a great chatbot is to start with a good base, using a mapping document. Start small, and gradually build up your chatbot’s scope. Once it’s live, carry on monitoring and improving your chatbot with regular training before you attempt scaling. It’s essential to find a systematic way to measure the performance of your training data to benchmark your work.