Speech Recognition: Teaching Your Bot to Speak
Verbal communication is the preferred form of communication for most people, so it makes sense to have your chatbot communicate in the same way. Learn the role that speech recognition plays in making your bot understand speech and how speech synthesis can help you build out your bot’s voice, emotions, and language.
December 4, 2018
In one form or another, chatbots have become the norm in many industries. Whether healthcare, retail, banking, or customer support, you’ve probably encountered a chatbot to help you with your request. In fact, according to Salesforce, 69 percent of consumers say that they prefer interacting with chatbots for quick communication with brands.
The two types of chatbots, messenger chatbots and voice chatbots, share the same technologies—machine learning (ML) and natural language processing (NLP). Both types of chatbots have ML and NLP, but the way in which users interact with messenger and voice chatbots is what sets them apart. Messenger chatbots are hosted on messaging platforms (like Facebook or Slack), and voice chatbots are made for hands-free use (for example, automated call support or a virtual assistant, such as Amazon Alexa). And, although messenger chatbots have been around for years, voice chatbots are becoming more and more popular as their accuracy and knowledge increase.
It’s no wonder that interacting with a voice chatbot in a consumer setting is more and more common. As an integral part of our lives, verbal communication is used to impart knowledge and to inform others of our needs. And it is one of the more satisfying forms of communication, thanks to the ability to discern nuanced meaning from tone and inflection. With this technology, voice chatbots are better equipped to examine the behavior and needs of whomever they are interacting with, making it easier to ask follow-up questions and to develop an appropriate response.
Recognizing consumer requests
A great voice chatbot should be able to seamlessly respond to consumer inquiries or requests and to convince users that they are talking to a real person—not to a bot. Speech recognition and the knowledge base of the chatbot, also known as deep learning, play the biggest role in making this possible.
Human speech varies greatly, and it is the responsibility of the voice chatbot to decipher what a consumer is requesting. Speech recognition tools differ in capabilities, but they should enable the chatbot to understand the users. If they don’t, users can become frustrated at the chatbot’s lack of understanding, abandon their original request, and lose interest in the product or service.
To avoid this, your voice bot should include barge-in and timeout capabilities. Barge-in capabilities allow users to interrupt the chatbot while it is talking. This is important for users who are choosing an option from a list or looking to stop the chatbot mid-sentence. Speech recognition allows the chatbot to identify when the user starts talking, stop their own response or prompt, and then listen to the request instead of forcing the user to wait until the bot is finished speaking.
Then timeouts come into play. These allow the chatbot to register that the user has stopped talking and to recognize that it is now the chatbot’s turn to develop a response. This characteristic helps the chatbot seem as lifelike as possible by mimicking what would happen in a conversation between two humans.
Without the ability to discern interruptions and the end of a request, chatbots are unable to fulfill their role of interacting with users the same way in which a real person would.
Developing a chatbot
Initiating the development of a chatbot doesn’t start with your computer. You need to first pinpoint the intent of your chatbot, decide how you’re going to build it, and then determine the framework and technologies it needs to successfully help users.
Defining the purpose of your voice chatbot
The first step to developing a voice chatbot is to define its purpose. For example, which industry are you building your chatbot for? What is the purpose of the user’s input? Are you trying to drive sales? Process payments? What problem are you trying to solve?
It is imperative to define and answer these questions to ensure that you are fulfilling the purpose of your bot when you build it.
To code or not to code?
After you have laid out the purpose of your chatbot, the problem it is going to solve, and how users will interact with it, you must then decide the method by which you want to develop it—to either hand code the bot or to hire someone else to hand code it for you. If you choose the latter, you can leverage a service like Amazon Lex, Dialogflow, or IBM Watson to help you develop the bot.
When making this decision, take into account what you need to build your bot. Many factors, including how much coding experience you have, the level of customization your bot requires, and the amount of money you are willing to spend, can help determine which route you should take.
Hand coding—writing the code to create your voice chatbot—allows for complete customization of the design of the bot, including platform integrations and e-commerce features. However, any additions or changes that need to be made to your bot can be time-consuming, since you have to edit the code yourself. And, if you choose to hire a developer to create your bot, the initial coding and any changes can be costly.
If you do not have extensive coding skills or don’t want to pay for a developer, services that lead you through the development and that provide templates for building your bot may be the better option. Many of these bot-building services are geared toward simpler bot designs since the level of customization available is not as extensive. This means that any specific features you may have in mind for your bot might not be available. However, a bonus to using a bot-building service is the general cost savings. As a cheaper alternative to hiring a developer, some bot-building services allow you to create your bot for free and only charge for deployment and customer interaction. Others allow bot owners with a small user base to pay per consumer interaction with the bot.
Building your bot’s conversational framework
Next comes building the conversational framework, or user flow, of your bot. You should map out the user flow starting with how the bot should respond after a user calls upon it, and finishing with the resolution of the conversation.
Most voice chatbots follow a general conversational framework, which begins with a user making a request of some sort. After a user interacts with the chatbot vocally, the chatbot generally gives options on how to proceed, all leading to a call to action.
If the conversational flow you want for your chatbot differs from this, map out—from start to finish—how the user works their way through the conversation with the bot and how the bot should respond.
The role of speech recognition
Speech recognition technology takes an audio sampling of a user’s request and breaks it down into smaller parts, based on voice frequency and pitch, before feeding it into a neural network. The neural network then finds patterns in the audio sampling.
Recurrent neural networks can remember previous audio patterns and can use that data to help build out future responses. This training data also makes it easier for your chatbot to understand what a user wants—without being distracted or confused by background noise, dialects, or accents—because it has a larger pool of data to sift through to find similar requests and responses.
To further aid in your chatbot’s understanding, you can load in example phrases and natural language models. This information, in addition to the recurrent neural network, builds a deeper knowledge base for your bot to reference. The more data your bot’s artificial intelligence (AI) system has to pull from, the more intelligent it is, allowing it to develop a better response. And, if you have chosen to build a bot with a service, AI and speech recognition technologies are generally available within the platform.
Giving your bot a home
Now that you’ve developed your bot, the next step is to give it a home—a domain identity. Obtaining a .BOT domain makes it possible for you and your users to find bots. You can register a .BOT domain through a preferred bot builder, like GetBot, Pandorabots, or EnCirca.
Having a .BOT identity allows you to showcase your bot across various channels and to drive user engagement. However, to get a .BOT domain, you must have verified and published your bot with a supported bot-building service like Amazon Lex, Microsoft Bot Framework, or Dialogflow. Additionally, registering your voice chatbot with a .BOT domain positions your business and technology, making it easy to identify your bot offering.
Teaching your chatbot to talk
The same basic developmental flow is used for most voice chatbots and is made easy with many of the chatbot services available in the marketplace. However, many of those services provide a similar-sounding synthetic voice, leaving your bot without much of a personality. But it is possible to customize the language and accent of the speech synthesis voice. You just have to have the knowledge database to support and back up different languages, colloquialisms, and slang.
Speech synthesis markup language (SSML), which allows you to modify the voice of your chatbot using SSML tags, enables this. It is used by many chatbot building services in which a synthetic voice is already provided. Using SSML, you can code emotion, pronunciation, speech rate, and natural pauses into your bot. If you do not want to use the provided voice, it is also possible to synthesize your own voice or the voice of others who are pertinent to your brand or product.
Starting to speak with your bot
Voice chatbot development can be fairly simple, with services like Amazon Lex or Dialogflow. However, the most vital and challenging piece to making a successful voice chatbot is that of building a deep learning knowledge base. This is a database made up of either recorded chatbot conversations with users or data you input directly, which your bot can reference when working to understand and respond to another voice. The only real way to determine your bot’s effectiveness is to test it and to continually add to the data that your bot sifts through.
The ultimate goal is to create a voice chatbot that serves and fulfills its purpose. And this is achieved if the voice chatbot can provide meaningful responses and can continually learn from prior interactions as its knowledge base grows.
Now that you understand the value of a voice chatbot and how to make it speak, the next step is to start building. After you have mapped out the conversational flow, technologies, and features you want your chatbot to have, use our guide, “How to Choose a Framework for Your Next Bot Build,” to help you to build your next bot.