Trustworthiness Metrics for Measuring Efficiency of Chatbots - A Systematic Review

International Journal of Electrical and Electronics Engineering
© 2025 by SSRG - IJEEE Journal
Volume 12 Issue 4
Year of Publication : 2025
Authors : P. Sowmya, Vasudeva, Manjula Gururaj Rao
pdf
How to Cite?

P. Sowmya, Vasudeva, Manjula Gururaj Rao, "Trustworthiness Metrics for Measuring Efficiency of Chatbots - A Systematic Review," SSRG International Journal of Electrical and Electronics Engineering, vol. 12,  no. 4, pp. 132-142, 2025. Crossref, https://doi.org/10.14445/23488379/IJEEE-V12I4P109

Abstract:

A chatbot acts as an AI-based virtual assistant for many applications like websites, banking apps, customer support systems and many more. It uses Artificial Intelligence (AI) to respond to users' queries without human intervention. In an application where there could be hundreds of options, searching for a specific option becomes a hassle for the user. Chatbot could solve all such problems, where chat with the bot and work done. However, when so many AI applications exist, it becomes critical to determine whether an AI application or tool is trustworthy. This article focuses on different evaluation metrics, ethical concerns and trustworthiness of AI applications, which help predict the efficiency of different AI-based chatbot systems.

Keywords:

Artificial Intelligence, Chatbot, Evaluation, Natural Language Processing, Trust.

References:

[1] Eleni Adamopoulou , and Lefteris Moussiades “An Overview of Chatbot Technology”, Artificial Intelligence Applications and Innovations 16th IFIP WG 12.5 International Conference, Neos Marmaras, Greece, pp. 373-383, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Rohit Tamrakar, and Niraj Wani, “Design and Development of CHATBOT: A Review,” International Conference On “Latest Trends in Civil, Mechanical and Electrical Engineering”, Bhopal, India, pp. 1-15, 2021.
[Google Scholar]
[3] A.M. Turing, “I- Computing Machinery and Intelligence,” Mind, vol. LIX, no. 236, pp. 433-460, 1950.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Joseph Weizenbaum, “ELIZA- A Computer Program for the Study of Natural Language Communication between Man and Machine,” Comnutnieations of the ACM, vol. 9, no. 1, pp. 36-45, 1966.
[Google Scholar] [Publisher Link]
[5] Kenneth Mark Colby, Sylvia Weber, and Franklin Dennis Hilf, “Artificial Paranoia,” Artificial Intelligence, vol. 2, no. 1, pp. 1-25, 1971.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Richard S. Wallace, The Anatomy of A.L.I.C.E, Parsing the Turing Test Philosophical and Methodological Issues in the Quest for the Thinking Computer, Springer, Dordrecht, pp 181-210, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[7] P. Costa, “Conversing with Personal Digital Assistants: On Gender and Artificial Intelligence,” Journal of Science and Technology of the Arts, vol. 10, no. 3, pp. 59-72, 2018.
[Google Scholar] [Publisher Link]
[8] Sameera A. Abdul-Kader, and John Woods “Survey on Chatbot Design Techniques in Speech Conversation Systems” International Journal of Advanced Computer Science and Applications, vol. 6, no. 7, pp. 72-80, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Suprita Das, and Ela Kumar, “Determining Accuracy of Chatbot by Applying Algorithm Design and Defined process,” 4th International Conference on Computing Communication and Automation, Greater Noida, India, pp. 1-6, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Muhammad Yusril Helmi Setyawan, Rolly Maulana Awangga, and Safif Rafi Efendi, “Comparison of Multinomial Naive Bayes Algorithm and Logistic Regression for Intent Classification in Chatbot,” IEEE International Conference on Applied Engineering, Batam, Indonesia, pp. 1-5, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Albert Verasius Dian Sano et al., “The Application of AGNES Algorithm to Optimize Knowledge Base for Tourism Chatbot,” IEEE International Conference on Information Management and Technology, Jakarta, Indonesia, pp. 65-68, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Praveen Kumar et al., “Designing and Developing a Chatbot Using Machine Learning,” IEEE International Conference on System Modeling and Advancement in Research Trends, Moradabad, India, pp. 87-91, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Dijana Peras, “Chatbot Evaluation Metrics-Review Paper,” 36th International Scientific Conference on Economic and Social Development - Building Resilient Society, Zagreb, pp. 89-97, 2018.
[Google Scholar] [Publisher Link]
[14] Laila Hidayatin, and Faisal Rahutomo, “Query Expansion Evaluation for Chatbot Application,” IEEE International Conference on Applied Information Technology and Innovation, Padang, Indonesia, pp. 92-95, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Mohit Jain et al., “Evaluating and Informing the Design of Chatbots,” Proceedings of the Designing Interactive Systems Conference, Hong Kong, China, pp. 895 - 906, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[16] João Sedoc et al., “ChatEval: A Tool for Chatbot Evaluation,” Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, Minnesota, pp. 60-65, 2019.
[Google Scholar] [Publisher Link]
[17] Wari Maroengsit et al., “A Survey on Evaluation Methods for Chatbots,” Proceedings of the 7th International Conference on Information and Education Technology, Aizu-Wakamatsu, Japan, pp. 111-119, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Qingtang Liu et al., “CBET: Design and Evaluation of a Domain-Specifc Chatbot for Mobile Learning,” Universal Access in the Information Society International Journal, vol. 19, pp. 655-673, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Siddhi Pardeshi et al., “A Survey on Different Algorithms used in Chatbot” International Research Journal of Engineering and Technology, vol. 7, no. 5, pp. 6092- 6098, 2020.
[Google Scholar] [Publisher Link]
[20] Anirudh Khanna et al., “A Study of Today’s A.I. through Chatbots and Rediscovery of Machine Intelligence,” International Journal of u- and e- Service, Science and Technology, vol. 8, no. 7, pp. 277-284, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Satyendra Praneel Reddy Karri, and B. Santhosh Kumar, “Deep Learning Techniques for Implementation of Chatbots,” IEEE International Conference on Computer Communication and Informatics, Coimbatore, India, pp. 1-5, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Jacky Casas et al., “Trends & Methods in Chatbot Evaluation,” Companion Publication of the International Conference on Multimodal Interaction, Virtual event, Netherlands, pp. 280-286, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[23] S. Nithuna, and C.A. Laseena, “Review on Implementation Techniques of Chatbot,” IEEE International Conference on Communication and Signal Processing, Chennai, India, pp. 0157-0161, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Shih-Hung Wu, and Sheng-Lun Chien, “Learning the Human Judgment for the Automatic Evaluation of Chatbot,” Proceedings of the 12th Conference on Language Resources and Evaluation, Marseille, France, pp. 1598-1602, 2020.
[Google Scholar] [Publisher Link]
[25] V. Vijayaraghavan, Jack Brian Cooper, and J. Rian Leevinson, “Algorithm Chatbot Inspection for Chatbot Performance Evaluation,” Third International Conference on Computing and Network Communications, Procedia Computer Science, vol. 171, pp. 2267-2274, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Guendalina Caldarini, Sardar Jaf, and Kenneth McGarry, “A Literature Survey of Recent Advances in Chatbots,” Information, vol. 13, no. 1, pp. 1-22, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Xu Han et al., “Democratizing Chatbot Debugging: A Computational Framework for Evaluating and Explaining Inappropriate Chatbot Responses,” Proceedings of the 5th International Conference on Conversational User Interfaces, Eindhoven, Netherlands, pp. 1-7, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Ganesh Reddy Gunnam et al., “Assessing Performance of Cloud-Based Heterogeneous Chatbot Systems and A Case Study”, IEEE Access, vol. 12, pp. 81631-81645, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Daniel Escobar-Grisales, Juan Camilo Vasquez-Correa, and Juan Rafael Orozco-Arroyave, “Evaluation of Effectiveness in Conversations Between Humans and Chatbots Using Parallel Convolutional Neural Networks with Multiple Temporal Resolutions Multimedia Tools and Applications,” Multimedia Tools and Applications, vol. 83, pp. 5473-5492, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Abbas Saliimi Lokman, and Mohamed Ariff Ameedeen, “Modern Chatbot Systems: A Technical Review,” Proceedings of the Future Technologies Conference, vol. 2. pp. 1012-1023, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Aaron Drapkin, 13 Best Free and Paid AI Chatbots in 2024: ChatGPT, Gemini & More, 2024. [Online] Available: https://tech.co/news/best-ai-chatbots
[32] Juliette Mattioli et al., “An Overview of Key Trustworthiness Attributes and Kpis for Trusted Ml-Based Systems Engineering,” AI and Ethics, vol. 4, no. 1, pp. 15-25, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Nitin Rane, Saurabh Choudhary, and Jayesh Rane. “Artificial Intelligence (AI), Internet of Things (IoT), and Blockchain-Powered Chatbots for Improved Customer Satisfaction, Experience, and Loyalty,” SSRN Electronics Journal, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Zuhal ‘Alimul Hadi et al., “The Influence of Transparency, Anthropomorphism, and Positive Politeness on Chatbots for Service Recovery in E-Health Applications,” Cogent Social Sciences, vol. 10, no. 1, pp. 1-22, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Jijie Zhou, and Yuhan Hu, “Beyond Words: Infusing Conversational Agents with Human-like Typing Behaviors,” Proceedings of the 6th ACM Conference on Conversational User Interfaces, Luxembourg, pp. 1-12, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Cecylia Borek, “Comparative Evaluation of LLM-Based Approaches to Chatbot Creation,” Master’s Thesis, Tampere University, pp. 1 64, 2024. [Google Scholar] [Publisher Link] [37] Asad Ali, “Assessing AI Chatbots through Meta-Analysis of Deep Learning Models,” EasyChair Preprint, pp. 1-10, 2024.
[Google Scholar] [Publisher Link]
[38] Xiaojie Wang et al., “A Survey on Trustworthy Edge Intelligence: from Security and Reliability to Transparency and Sustainability,” IEEE Communications Surveys and Tutorials ( Early Access ), pp. 1-1, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[39] Muhammad Farrukh Shahzad et al.,“Assessing the Impact of AI-Chatbot Service Quality on User E-Brand Loyalty through Chatbot User Trust, Experience and Electronic Word of Mouth,” Journal of Retailing and Consumer Services, vol. 79, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Jesús Sánchez Cuadrado et al., “Automating the Development of Task-Oriented LLM-Based Chatbots,” Proceedings of the 6th ACM Conference on Conversational User Interfaces, Luxembourg, pp. 1-10, 2024.
[CrossRef] [Google Scholar] [Publisher Link]