Comparative Analysis of Machine Learning and Deep Learning Models for Sentiment Analysis in Somali Language

International Journal of Electrical and Electronics Engineering
© 2023 by SSRG - IJEEE Journal
Volume 10 Issue 7
Year of Publication : 2023
Authors : Abdullahi Ahmed Abdirahman, Abdirahman Osman Hashi, Ubaid Mohamed Dahir, Mohamed Abdirahman Elmi, Octavio Ernest Romo Rodriguez
pdf
How to Cite?

Abdullahi Ahmed Abdirahman, Abdirahman Osman Hashi, Ubaid Mohamed Dahir, Mohamed Abdirahman Elmi, Octavio Ernest Romo Rodriguez, "Comparative Analysis of Machine Learning and Deep Learning Models for Sentiment Analysis in Somali Language," SSRG International Journal of Electrical and Electronics Engineering, vol. 10,  no. 7, pp. 41-52, 2023. Crossref, https://doi.org/10.14445/23488379/IJEEE-V10I7P104

Abstract:

Understanding and analysing sentiment in user-generated content has become crucial with the increasing use of social media and online platforms. However, sentiment analysis in less-resourced languages like Somali poses unique challenges. This paper presents the performance of three ML algorithms (DTC, RFC, XGB) and two DL models (CNN, LSTM) in accurately classifying sentiment in Somali text. The CC100-Somali dataset, comprising 78M monolingual Somali texts from the Common crawl snapshots, is utilized for training and evaluation. The study employed rigorous evaluation techniques, including train-test splits and cross-validation, to assess classification accuracy and performance metrics. The results demonstrated that DTC achieved the highest accuracy among ML algorithms, 87.94%, while LSTM achieved the highest accuracy among DL models, 88.58%. This study's findings contribute to sentiment analysis in less-resourced languages, specifically Somali, and provide valuable insights into the performance of ML and DL techniques. Moreover, the study highlights the potential of leveraging both ML and DL approaches to analyze sentiment in Somali text effectively. The results and evaluation metrics benchmark future research in sentiment analysis for Somali and other low-resource languages.

Keywords:

Somali language, Sentiment analysis, Machine learning, Deep learning, Somali dataset.

References:

[1] Shamsuddeen Hassan Muhammad et al., “Afrisenti: A Twitter Sentiment Analysis Benchmark for African Languages,” arXiv preprint arXiv:2302.08956, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Ife Adebara, and Muhammad Abdul-Mageed, “Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go,” arXiv preprint arXiv:2203.08351, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Ehsan Hosseini-Asl, Wenhao Liu, and Caiming Xiong, “A Generative Language Model for Few-Shot Aspect-Based Sentiment Analysis,” arXiv preprint arXiv:2204.05356, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Amina Imam Abubakar et al., “An Enhanced Feature Acquisition for Sentiment Analysis of English and Hausa Tweets,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 9, pp. 102-110, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Mountaga Diallo, Chayma Fourati, and Hatem Haddad, “Bambara Language Dataset for Sentiment Analysis,” arXiv preprint arXiv:2108.02524, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Manish Suyal, and Parul Goyal, “A New Classifier Model on Drug Reviews Dataset by VADER Sentiment Analyzer to Analyze Reviews of the Dataset are Real or Fake based on Machine Learning,” International Journal of Engineering Trends and Technology, vol. 70, no. 7, pp. 68-78, 2022.
[CrossRef] [Publisher Link]
[7] Francisco Javier Ramírez-Tinoco et al., “A Brief Review on the Use of Sentiment Analysis Approaches in Social Networks,” Trends and Applications in Software Engineering, pp. 263-273, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Kelechi Ogueji, Yuxin Zhu, and Jimmy Lin, “Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-Resourced Languages,” In Proceedings of the 1st Workshop on Multilingual Representation Learning, pp. 116-126, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Chi Sun, Luyao Huang, and Xipeng Qiu, “Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence,” arXiv preprint arXiv:1903.09588, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Oliver Adams et al., “Cross-Lingual Word Embeddings for Low-Resource Language Modeling,” In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 1, pp. 937-947, 2017.
[Google Scholar] [Publisher Link]
[11] Željko Agić, and Ivan Vulić, “JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages,” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3204-3210, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Nassera Habbat, Houda Anoun, and Larbi Hassouni, “Sentiment Analysis and Topic Modeling on Arabic Twitter Data During Covid-19 Pandemic,” Indonesian Journal of Innovation and Applied Sciences (IJIAS), vol. 2, no. 1, pp. 60-67, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Ghadah Alwakid et al., “MULDASA: Multifactor Lexical Sentiment Analysis of Social-Media Content in Nonstandard Arabic Social Media,” Applied Sciences, vol. 12, no. 8, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Sven-Oliver Proksch et al., “Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches,” Legislative Studies Quarterly, vol. 44, no. 1, pp. 97-131, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Nawaf O. Alsrehin, Ahmad F. Klaib, and Aws Magableh, “Intelligent Transportation and Control Systems using Data Mining and Machine Learning Techniques: A Comprehensive Study,” IEEE Access, vol. 7, pp. 49830-49857, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Oumaima Oueslati et al., “A Review of Sentiment Analysis Research in Arabic Language,” Future Generation Computer Systems, vol. 112, pp. 408-430, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Adnan A. Hnaif, Emran Kanan, and Tarek Kanan, “Sentiment Analysis for Arabic Social Media News Polarity,” Intelligent Automation & Soft Computing, vol. 28, no. 1, pp. 107-119, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Rincy Jose, and Varghese S Chooralil, “Prediction of Election Result by Enhanced Sentiment Analysis on Twitter Data using Classifier Ensemble Approach,” In 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE), pp. 64-67, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Prajwal Madhusudhana Reddy, “Conducting Sentiment Analysis on Twitter Tweets to Predict the Outcomes of the Upcoming Karnataka State Elections,” SSRG International Journal of Computer Science and Engineering, vol. 10, no. 6, pp. 22-35, 2023.
[CrossRef] [Publisher Link]
[20] Md. Mashiur Rahaman Mamun, Omar Sharif, and Mohammed Moshiul Hoque, “Classification of the Textual Sentiment using Ensemble Technique,” SN Computer Science, vol. 3, no. 1, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Jyotismita Chaki et al., “Machine Learning and Artificial Intelligence-Based Diabetes Mellitus Detection and Self-Management: A Systematic Review,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 6, pp. 3204-3225, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Prafulla Mohapatra et al., “Sentiment Classification of Movie Review and Twitter Data using Machine Learning,” International Journal of Computer and Organization Trends, vol. 9, no. 3, pp. 1-8, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[23] M. Avinash, and E. Sivasankar, “A Study of Feature Extraction Techniques for Sentiment Analysis,” Emerging Technologies in Data Mining and Information Security: Advances in Intelligent Systems and Computing, Singapore, vol. 814, pp. 475-486, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Shamsuddeen Hassan Muhammad et al., “Naijasenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis,” arXiv preprint arXiv:2201.08277, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Nouri Hicham, Sabri Karim, and Nassera Habbat, “An Efficient Approach for Improving Customer Sentiment Analysis in Arabic using an Ensemble Machine Learning Technique,” In 2022 5th International Conference on Advanced Communication Technologies and Networking (CommNet), pp. 1-6, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Xin Hu et al., “Research on a Hybrid Prediction Model for Purchase Behaviour Based on Logistic Regression and Support Vector Machine,” In 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 200-204, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Kuan-Hao Huang et al., “Learning Easily Updated General Purpose Text Representations with Adaptable Task-Specific Prefixes,” arXiv preprint arXiv:2305.13499, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Keerthi Sagiraju, and Shashi Mogalla, “Deployment of Deep Reinforcement Learning and Market Sentiment Aware Strategies in Automated Stock Market Prediction,” International Journal of Engineering Trends and Technology, vol. 70, no. 3, pp. 37-47, 2022.
[CrossRef] [Publisher Link]
[29] Atnafu Lambebo Tonja et al., “Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities,” arXiv preprint arXiv:2303.14406, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Zhenyu Zhang et al., “Learning Structural Co-Occurrences for Structured Web Data Extraction in Low-Resource Settings,” In Proceedings of the ACM Web Conference, pp. 1683-1692, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Weijia Xu et al., “Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection,” arXiv preprint arXiv:2301.07779, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Edward Gow-Smith, and Danae Sánchez Villegas, “Sheffield's Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages,” arXiv preprint arXiv:2306.09830, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Saheed Abdullahi Salahudeen et al., “HausaNLP at SemEval-2023 Task 12: Leveraging African Low Resource TweetData for Sentiment Analysis,” arXiv preprint arXiv:2304.13634, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Arthur Carvalho et al., “Off-the-Shelf Artificial Intelligence Technologies for Sentiment and Emotion Analysis: A Tutorial on using IBM Natural Language Processing,” Communications of the Association for Information Systems, vol. 44, no. 1, pp. 918-943, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Joycelyn Laryea, and Nipunika Jayasundara, “Automatic Speech Recognition System for Somali in the Interest of Reducing Maternal Morbidity and Mortality,” Independent Thesis Advanced Level, Dalarna University, 2020.
[Google Scholar] [Publisher Link]
[36] Jörg Tiedemann, Proceedings of the 21st Nordic Conference on Computational Linguistics (NoDaLiDa), Linköping University Electronic Press, vol. 131, 2017.
[Google Scholar] [Publisher Link]
[37] [Online]. Available: https://autonlp.ai/datasets/cc100-somali