Automating the Measurement of Heterogeneous Chatbot Designs (paper review)

Paper from: http://miso.es/pubs/ACMSAC_2022.pdf

Using chatbots has gained importance in recent years, which has resulted in development of several chatbot platforms (like Amazon Lex, Google DialogFlow or IBM Watson). However, there is a limited number of studies related to quality assurance of chatbots. The paper from Pablo C. Cañizares, Sara Pérez-Soler, Esther Guerra and Juan de Lara addresses just this problem – how to guide testing of chatbots by using design metrics.

The paper proposes six global metrics (e.g., number of intents of the bot), eight intent metrics (e.g., number of training phrases per intent), three entity metrics (e.g., word length), and three flow metrics (e.g., conversation length). By measuring the values for these metrics, software designers of chatbots can predict three usability types – effectiveness, efficiency and satisfaction. To support the measurement process, the paper proposes a tool, available on GitHub, which can be used by practitioners. For some of the metrics, the tool employs machine learning and natural language processing. The metrics and the tool are evaluated on twelve chatbot designs. The tool could identify quality issues in terms of readability, conversation complexity, user experience and bot understanding. This demonstrates the usefulness of the tool in practice and how these metrics can help software developers in designing high-quality bots.

The metrics from the paper are:

INT – # intents
ENT – # user-defined entities
FLOW – # conversation entry points
PATH – # different conversation flow paths
CNF – # confusing phrases
SNT – # positive, neutral, negative output phrases
TPI – # training phrases per intent
WPTP – # words per training phrase
VPTP – # verbs per training phrase
PPTP – # parameters per training phrase
WPOP – # words per output phrase
VPOP – # verbs per output phrase
CPOP – # characters per output phrase
READ – reading time of the output phrases
LPE – # literals per entity
SPL – # synonyms per literal
WL – word length
FACT – # actions per flow
FPATH – # conversation flow paths
CL – conversation length

I will try to use these metrics if I write chatbot 🙂

Author: Miroslaw Staron

I’m professor in Software Engineering at Computer Science and Engineering. I usually blog about interesting articles (for me) and my own reflections on the development of Software Engineering, AI, computer science and automotive software. View all posts by Miroslaw Staron