Multi-Chatbot Systems Could Enable Smoother Machine-Human Dialogues
Ensemble-based deep reinforcement learning (DRL) technique could enhance interactions between chatbots and humans.
November 20, 2019
Researchers at the Lincoln Centre for Autonomous Systems (L-CAS) and Samsung Research have recently developed an ensemble-based deep reinforcement learning (DRL) technique that could enhance interactions between chatbots and humans. Their paper, published in Elsevier’s Neurocomputing journal, introduces a new approach to training multiple chatbots using numerical rewards according to the quality of their dialogues.
The initial approach
“The first and main idea that we evaluated was to train a chatbot from example text-based dialogues and without any labeled data using a machine learning approach called value-based DRL,” Heriberto Cuayáhuitl, one of the researchers that carried out the study, said. “In this type of machine learning, a DRL agent calculates the importance of state-action pairs (i.e., situation-decisions) in order to choose actions with the highest long-term expected numerical rewards.”
One of the key goals of the research carried out by Cuayáhuitl and his colleagues was to develop a deep learning approach that would spare developers the hassle of having to label training data for their chatbots, which typically requires considerable time and resources. The researchers tried to achieve this using several different machine learning techniques, the first of which is value-based DRL.
“While reviewing state-of-the-art approaches commonly used to develop neural-based chatbots, we noted that this machine learning approach (i.e., value-based DRL) had not been applied to chatbots,” Cuayáhuitl explained. “To achieve this, we used 'automatic action discovery' via another machine learning approach called unsupervised clustering.”
In their study, the researchers used sentence clustering to define the actions of DRL agents. These machine learning agents learn to complete a task—in this case, to communicate with humans—based on numerical rewards that they receive for each action they perform.
Risks & rewards
The researchers decided to implement an effective dialogue reward scheme, in which each DRL algorithm receives points for choosing the right sentence out of a list of options. On the other hand, when the agent chooses a random sentence from other dialogues, it loses points, as this typically results in incoherent dialogues. Penalizing the machine learning agent when it selects random sentences teaches it to avoid this behavior, improving its performance over time.
“In stark contrast with previous studies, we trained multiple DRL agents (i.e., a multi-chatbot) instead of a single agent,” Cuayáhuitl said. “We found that our system was able to remember better how to behave in its given environment than a single agent (i.e., it was less ‘confused’).”
Using this innovative approach called ensemble-based DRL, Cuayáhuitl and his colleagues developed a chatbot comprised of multiple agents running in parallel. All of these agents provide a response to each user query, but only the best, most humanlike response (i.e., the one with the highest predicted dialogue reward) is selected and sent back to the user.
Predicted dialogue rewards
The ‘predicted dialogue rewards’ are awarded by a dialogue reward generator, an algorithm trained on a dataset of humanlike and non-humanlike dialogues. Essentially, the more noise (i.e., unwanted/random word sequences) contained in a chatbot’s response to a query, the lower the numerical reward it receives.
“In our ensemble, each chatbot uses its own dialogue history (sentence vectors), selects an action (sentence), receives a reward (+1 if humanlike sentence, –1 if non-humanlike sentence), and this is done iteratively until convergence (for example, until no more improvement in average reward),” Cuayáhuitl said. “We found that this approach does better than single deep reinforcement/supervised learning agents.”
The researchers evaluated their multi-chatbot system in a series of tests and found that it performed better than single-agent systems, though still far from achieving humanlike and smooth dialogues. In the meantime, their findings suggest that training multiple chatbot agents rather than a single one can improve machine-human communication.
Cuayáhuitl and his colleagues also evaluated the dialogue reward generator used to rate agent answers, comparing its ratings for specific responses to those given by humans. They found a strong correlation between human ratings and those produced by the reward generator. Their reward scheme could therefore help to train chatbots without requiring humans to label the training data, thus sparing researchers considerable time and effort.
“The key contribution of our paper is that it introduces the idea of using a multi-chatbot approach instead of a single-chatbot approach and rewarding a chatbot using numerical rewards from humanlike and non-humanlike dialogues,” Cuayáhuitl added.
Striving towards human conversation
The recent study carried out by Cuayáhuitl and his colleagues highlights the huge potential of multi-chatbot systems, suggesting that they could enable more natural and engaging human-machine interactions. However, as dialogues between human users and chatbots can be unpredictable and different from those used to train deep learning agents, finding a way to generalize their approach to different conversational partners remains a key challenge.
“In our future work, we would like to develop cutting-edge, real-world applications,” Cuayáhuitl said. “For example, we are currently training a humanoid robot to play and chat in an interleaved way for increasing naturalness in human-robot interactions.”
Illustration of ensemble-based deep reinforcement learning for chatbots
Ensemble-based deep reinforcement learning for chatbots, Neurocomputing,
DOI: 10.1016/j.neucom.2019.08.007. https://www.sciencedirect.com/science/article/pii/S0925231219311269