Chatbot Development
Measuring Chatbot Performance
Assessing chatbot performance isn’t always as straightforward as evaluating other technological tools, as it often entails trying to understand how a user perceives the agent.
By Ingrid Fadelli
March 3, 2020
A growing number of individuals and organisations worldwide are now using chatbots to facilitate various processes, including customer service and e-shopping assistance.
Assessing chatbot performance, however, isn’t always as straightforward as evaluating other technological tools, as it often entails trying to understand how a user perceives the agent.
Measuring chatbot performance
In a recent study, a team of researchers at Kozminski University, SWPS University, MIT Center for Collective Intelligence, and University of Warsaw developed a new method to track human-chatbot interactions and measure chatbot performance.
This new technique, presented in a paper published in Elsevier’s Business Horizons journal, specifically focuses on ethical concerns, such as human trust in chatbots.
Gaining user trust isn't always easy.
“As a research team, we were for a long time interested in chatbots as an interesting case of human-machine interaction patterns,” Aleksandra Przegalinska, one of the researchers who carried out the study, told discover.bot.
Are bots useful to users?
In a previous study, the same team of researchers tried to link chatbots with affective computing in order to investigate which types of bots are useful to users and which ones are not. They specifically examined this in relation to human psychopathological responses, such as stress, fear, etc.
Their new study, on the other hand, was aimed at investigating chatbots from a more ethical perspective, to unveil the values they present in their dialogues with humans.
Starting from the assumption that trust is a core element for enabling successful human-chatbot interactions, the researchers tried to assess how trust as a category is being redefined by the advent of chatbots and deep learning–based conversational agents.
“In this research, we very much focused on researching chatbots’ integrity in terms of values represented by them in conversations with humans,” Przegalinska said.
“For this purpose, we used Peter Gloor's Condor Tribefinder system, which can be used to detect various relevant categories, such as ideology, lifestyle, etc.”
Przegalinska and her colleagues developed a new method to evaluate chatbot performance that links neuroscientific techniques with text mining and machine learning.
The system they used to examine human-robot interactions, called Tribefinder, was developed by Gloor, one of the team members.
Reading social patterns
Tribefinder is a machine learning–based text-mining technique for revealing Twitter users’ so-called “tribal affiliations” (i.e., specific social groups they belong to) based on their social media activity.
Bot interactions can reflect social affiliations.
Using this method, the researchers tried to gain a better understanding of “tribal languages” that specific groups of people use when interacting with chatbots.
“We also investigated whether a chatbot and humans interacting with it started mirroring each other’s word usage by using words representative of the same digital tribe,” Gloor said.
“One example of a digital tribe was the journalist, who would be using honest language reporting the truth. The opposite category was the politician, who would be saying what others wanted to hear rather than the truth. We found that people interacted with the chatbot 100 percent as journalists, while the chatbot in 15 percent of the time replied using words typical of a politician.”
The values of a bot
The new technique developed by the researchers can be used to assess chatbot performance based on the values they seem to represent when interacting with humans.
These values are considered particularly relevant because they can increase a human user’s trust in chatbots, which could ultimately result in more effective human-chatbot interactions.
“The novelty lies mainly in the proposed methodology, consisting in analyzing the content of messages produced in human-chatbot interactions, using Condor Tribefinder,” Przegalinska said.
“The system was developed for text mining that is based on a machine learning classification engine. We hope that our results will help to develop better social bots in business or commercial environments.”
Moving forward
In the future, the findings collected by this team of researchers could inform the development of new chatbots with specific values that are more likely to help them gain human trust.
In addition, the methodology they proposed could be used to evaluate the performance of specific chatbots based on ethical characteristics associated with their communication style.
“We are now working on a new project that investigates the issue of robotization (shared human-machine work environments),” Przegalinska said.
“In this new study, we wish to explore whether using AI can affect effectiveness and satisfaction in work environments.”