In the previous post, we discussed the first steps in AI training processes. In this post, we will look into the possible training phases when the Conversational AI is already up and running. In some ways, AI trainers are the parents of the chatbots. There needs to be skillful parenting in order for the child to grow up and lead a successful independent life. Chatbots behave in the same way. You do not want to underdo it and not particularly overdo it either. We have found out that training and analyzing about 10% of the monthly discussions will be sufficient for the positive evolution of a conversational AI. Efforts lower than that might not paint full picture of the current situation. On the other hand, overtraining can lead to data imbalance, which could cause unwanted biases and inaccuracy in the decision-making.
People frequently think that AI training might end up being too time consuming. This is not the case. Training 1000-2000 phrases or individual customers inputs every month would be sufficient for a typical chatbot, that processes approximately 20,000 conversations per month. This can be usually done in a few days. Especially, if you have a simple phrase and intent matching tool available.
Furthermore, AI can learn from live chat software and can be trained with it. So, every time a live agent is working, the chatbot gets a little bit smarter. The virtual assistant is always next to the live agent, offering most likely answers that could be easily used by the human agent. In this manner, AI enables live agents to be quicker and receives intent matching accuracy in return.
In addition to training, equally important is to analyze the chatbot’s performance. The most obvious way of measuring it would be to calculate the percentage of correct predictions from total analyzed phrases and compare changes over time. For example, if I analyze 1000 bot predictions and validate 700 of them, we can assume that the chatbot currently understands intents correctly 70% of the time. While linking unique phrases to topics, one should also try to understand which topics or groups of topics are troublesome for the AI. This allows to quickly implement new topics to the system. Moreover, the analysis should include latest trends in the ways people ask questions or demand answers.
For example, take a look at the picture of our AI training tool below. Trainers can easily pick the dates from where the data will be displayed. From there they can check the chatbot’s ability to match customer enquiries to correct topics and validate the accurate ones. When the predicted intent is false, correct option can be chosen from the dropdown menu. In some cases, ignore button should also be used. A phrase should be ignored if it is a duplicate or spam message. This helps to preserve AI’s intent matching accuracy and prevent additional work in later stages (such as training data cleaning).
Picture 1. AlphaAI Inbox training tool
Finally, bigger companies with complex customer service support systems need to closely think through the structure of their chatbot. It is vital to narrow down the total number of topics and group similar ones. A handy way to solve too many topics problem is to generate menus for big topic groups. Although, if a company still needs to have 1000 different topics and 100 phrases linked to every one of them, then the training data must be organized very precisely.
It is highly unlikely that every 100-phrase cluster (1 topic) contains only sentences unique to the said topic. Hypothetically, lets imagine a scenario where there are two topics. “Open an account” and “Order new card”. Both topics will most likely consist of multiple different variations of “I want to open an account”, “I need to reopen my account”, “I want to order a new plastic card”, “I lost my old card. I need to order a new one” et cetera. Keyword “account” will most likely point the AI to “Open an account” topic. Same goes for keyword “card”, which is specific enough for the AI to immediately recognize it as “Order new card” phrase. In this case, the other components of the sentences, such as “I want to” and “I need to” are irrelevant in the grand scheme of things, because the phrases include something unique in addition to those general components. Now imagine the situation with 1000 topics and overlapping training data. The accuracy will probably suffer if the bot needs to analyze phrases that seem to fit under many different topics. For this reason, we suggest having less training data and making sure that every character of it represents specific topics.