Emerging Technology

Service Chatbots: How Voice Recognition Is Making Them More Sophisticated

Voice recognition, combined with progress in deep learning and natural language processing, is allowing chatbots to respond based on understanding. Learn more about some of the challenges facing bot developers and what needs to be done so that bots may be able to replace humans for customer service interactions.

May 22, 2018

Chatbots, or bots, with customer service skills feel like a thing of the future, but increasingly are things of the present. And voice recognition is making them even better at their jobs.

How many times have you called the bank, pharmacy, or power company and tried to interact with their voice recognition technology? And how many of those times have you given up and used the keypad to repeatedly hit “0”—until you’re eventually put on hold for a human agent?

Progress in voice recognition will make that experience a thing of the past. As of mid-2017, voice recognition platforms Alexa, Cortana, Google Assistant and Siri report a word error rate of roughly 5 percent. Another advance in voice recognition comes from speech-to-text software, which allows users to dictate documents rather than type them.

As technology works to perfect voice-based personal assistants, an increasing number of companies are trying to provide intelligent service bots that can converse with consumers much as a human agent would. Although one functionality is spoken and the other written, they share technologies—machine learning and natural language processing (NLP). Bots benefit from the developments of voice recognition technologies and from the increasing amount of data that voice recognition provides.

Voice recognition challenges

In most ways, the priorities of voice recognition tools and those of service bots are the same. They each need to interpret a string of words as a meaningful phrase and determine intent. In the case of bots, they also need to generate a relevant response.

The challenges voice recognition professionals are working to overcome include background noise, accents or dialects, and speed of speech. Additionally, bots have limited knowledge of language and can’t always interpret the context of a sentence to discern the correct meaning of words that have homophones. Finally, bots need to be able to parse not only words and phrases but also the meaning of entire sentences. When you think about how many different ways there are to say one thing, you understand how big this hurdle is.

The good news is that the field is progressing quickly. Speech recognition technology increasingly uses statistical predictive models to better understand words, regardless of atypical usage or dialectical variations. In machine learning, there are experiments with neural networks that allow bots to proactively parse sentences. Headway in natural language processing and sentiment analysis is helping bots to better understand intent. These advances in voice recognition will be integrated into customer service bots to make them even more helpful and useful.

Machine learning produces deep learning

In the early days, customer service bots would offer a number of choices and would give additional options based on your selection from a list—similar to the functionality of a simple conversational interface (CI). Now, these bots increasingly ask users to state his or her need and then uses the words to try and parse a response. These changes come from advances in machine learning, specifically deep learning, which is pushing the field of machine learning forward.

Deep learning is basically a process of giving a computer enough data to enable it to make decisions about other data rather than relying on programming for responses. This is accomplished with neural networks, which aim to imitate functions of the human brain’s neural system—that is, perceiving, retaining, and adapting to stimuli from the surrounding environment. The goal is to create AI that can understand context and variations in speech to then arrive at a correct interpretation and to generate a useful response.

Traditional machine learning relies on providing data sets that computers can draw from to respond to queries, and neural networks aim to allow computers to respond to input beyond what can be predicted by that data. For all conversational modes (voice recognition bots as well as text-based service bots), these improvements mean that companies can provide more accurate communication with users.

Auditory recognition: “Hearing” correctly

At the same time that voice recognition foundations are being built, auditory recognition—that is, recognizing the sounds humans make and converting them to text—is also making strides. And deep learning is improving speech recognition. Ultimately, voice interfaces are getting better at distinguishing a human voice from other noise, recognizing distinct words, and correctly identifying word meanings.

As you explore bots in general and service bots in particular, one important thing to keep in mind is that these are interdependent factors—deep learning helps voice recognition, which gives professionals more data to work with to, in turn, drive machine learning and NLP forward. Companies trying to provide successful personal assistants and those offering speech-to-text software, such as Dragon, are highly motivated to create better speech recognition products. Service bots don’t require speech recognition to function but will benefit from the work done in that area all the same.

Natural language processing is advancing

The goal of NLP is to enable conversations between humans and computers that make sense to humans. Rather than treating each word as a symbol, computers using natural language processing aim to understand whole sentences. This is hardest in spoken language, with its natural variations in volume, cadence, and word choice and its deviations from the more formal written language that most of us (including computers) are trained to use.

NLP is benefitting from advances in deep learning and in the use of neural networks. Training a computer in NLP relies on the use of a predetermined set of data (called a corpus) to teach it language. However, one major limitation is in the corpus itself, which requires as much written text as is available. Speech recognition helps by providing a wider range of data than would otherwise be available, and it’s also relevant to the functions of bots using NLP. As previously mentioned, the goal is to move from programs that can not only draw on those training sets but that also know enough about language to parse input that is not explicitly included in that data.

The next step, of course, is to create meaningful and relevant responses. This is where bots enter the scene. Any customer service situation that requires interaction with a human is a good opportunity for a bot equipped with NLP skills. The value here is not only in savings of employee time and company cost but also in customer engagement. Only a service bot with strong NLP function can create something that approaches humanlike engagement. As NLP improves, the sense of interaction that customers experience with bots—and therefore with a company—will also improve.

Sentiment analysis

Communication is about more than parsing sentences; it’s also about recognizing an interlocutor’s feelings and attitudes. In customer service, this skill can make the difference between a satisfied customer and an angry one. And this is where sentiment analysis comes in.

Sentiment analysis often refers to analyzing social media to get an overall sense of how satisfied or frustrated consumers are with a brand. It’s also an aspect of customer service that employs speech-to-text capabilities to analyze real-time conversations in call centers and to provide companies with data about their customers. Beyond its QA purpose, sentiment analysis data collected via speech-to-text technology can be applied to bot performance. Aspects of spoken interactions, such as pauses in speech or word choices that indicate emotional shifts, are also relevant in written interactions.

If they are to successfully “replace” humans, customer service bots will need to have these capabilities, which are required to keep users engaged in the conversation long enough to meet the users’ needs or to determine when a human customer service agent needs to be brought into the conversation. In short, service bots function better with more emotional intelligence.


Right now, chatbots either function on conversational interfaces, which can give only a limited set of responses, or gamble on NLP, which still often leads to frustrated users. Conversational Interfaces will stick around, but they will be employed according to specific user needs and not because they’re the best option available. Progress in deep learning and natural language processing will increasingly allow chatbots to not only react based on programming but also to respond based on understanding. When professionals figure out how to integrate sentiment analysis into service bot technology, we’ll see the beginning of bots that can successfully replace humans for customer service interactions.

Progress in these technologies applies in some way to both voice-based and text-based bots and, since digital personal assistants aren’t going anywhere, we can expect bots to improve right along with them.