Improving Sentiment Analysis on Imbalanced Airlines Twitter Data Using DSMOTE Technique

International Journal of Electronics and Communication Engineering
© 2023 by SSRG - IJECE Journal
Volume 10 Issue 9
Year of Publication : 2023
Authors : Shiramshetty Gouthami, Nagaratna P. Hegde
pdf
How to Cite?

Shiramshetty Gouthami, Nagaratna P. Hegde, "Improving Sentiment Analysis on Imbalanced Airlines Twitter Data Using DSMOTE Technique," SSRG International Journal of Electronics and Communication Engineering, vol. 10,  no. 9, pp. 38-51, 2023. Crossref, https://doi.org/10.14445/23488549/IJECE-V10I9P105

Abstract:

Recently, social media and microblogging have gained popularity and traffic. Customer tweets to US airlines take time to analyse. A sentiment analysis model for unbalanced datasets fixes this with the help of the SMOTE method. This paper uses a new over-sampling technique to synthesise more samples near easily misclassified cases, unlike standard SMOTE, which treats all minority group samples equally. We target misclassified minority class classification to improve accuracy. The model steps through tweet sentiment classification. First, remove tweets with special characters, URLs, and stop words. It cleans tweets and extracts features to create numerical feature vectors. The Bag of Words (BoW) model uses all unique tweet terms to develop a lexicon—the presence or absence of these words numbers each tweet. We use Random Forest (RF) and Recurrent Neural Network (RNN) classification models after transforming tweets into feature vectors. A Random Forest ensemble learning system classifies using many decision trees. RNNs process sequential text using internal memory states. RF and RNN models use tweet feature vectors. Models learn feature-sentiment label patterns. They can label new tweets positive, negative, or neutral. These classification models let the installed system classify tweets by sentiment, providing valuable sentiment analysis insights. These models accurately classify tweets as positive, negative, or neutral. The density-based SMOTE results show our model’s efficiency. TFIDF vectoriser Random Forest has 81% accuracy and 70% F1 score. These measures show the model can classify sentiment in imbalanced datasets, making it useful for sentiment analysis. 

Keywords:

Sentiment analysis, Class imbalance, Tweets, SMOTE, Classification.

References:

[1] Bing Liu, The Problem of Sentiment Analysis, In Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, Cambridge University Press, pp. 18-54, 2020. 
[CrossRef] [Publisher Link]
[2] Hamed Nozari, Javid Ghahremani-Nahr, and Agnieszka Szmelter-Jarosz, “AI and Machine Learning for Real-World Problems,” Advances in Computers, 2023. 
[CrossRef] [Google Scholar] [Publisher Link]
[3] Bing Liu, Web Data Mining Exploring Hyperlinks, Contents, and Usage Data, Data-Centric Systems and Applications, Springer Science & Business Media, pp. 1-532, 2007. 
[Google Scholar] [Publisher Link]
[4] Richard Socher et al., “Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank,” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631-1642, 2013. 
[Google Scholar] [Publisher Link]
[5] Jerzy BÃlaszczy´nski, Jerzy Stefanowski, and Marcin Szajek, “Local Neighbourhood in Generalizing Bagging for Imbalanced Data, Conference: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 1-15, 2013. 
[Publisher Link]
[6] Chris Seiffert et al., “RUSBoost: Improving Classification Performance When Training Data is Skewed,” 2008 19th International Conference on Pattern Recognition, pp. 1-4, 2008. 
[CrossRef] [Google Scholar] [Publisher Link]
[7] Nitesh V. Chawla et al., SMOTEBoost: Improving Prediction of the Minority Class in Boosting, Knowledge Discovery in Databases: PKDD, vol. 2838, pp. 107-119, 2003. 
[CrossRef] [Google Scholar] [Publisher Link]
[8] Akash Yadav et al., “Sentiment Analysis Using Twitter Data,” International Journal for Research in Applied Science and Engineering Technology, vol. 11, no. 5, pp. 5833-5837, 2023. 
[CrossRef] [Publisher Link]
[9] S. Celine, M. Maria Dominic, and M. Savitha Devi, “Logistic Regression for Employability Prediction,” International Journal of Innovative Technology and Exploring Engineering, vol. 9, no. 3, pp. 2471-2478, 2020. 
[CrossRef] [Google Scholar] [Publisher Link]
[10] Huay Wen Kang et al., “Sentiment Analysis on Malaysian Airlines with BERT,” Journal of the Institution of Engineers, Malaysia, vol. 82, no. 3, pp. 47-52, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[11] Akash Gautam et al., “Semi-Supervised Iterative Approach for Domain-Specific Complaint Detection in Social Media,” Proceedings of the 3rd Workshop on E-Commerce and NLP, pp. 46-53, 2020. 
[CrossRef] [Google Scholar] [Publisher Link]
[12] Arijit Dey, Jitendra Nath Shrivastava, and Chandan Kumar, “Transformer Based Knowledge Graph Construction in Adverse Drug Reactions Prediction from Social Media Reviews,” International Journal of Engineering Trends and Technology, vol. 70, no. 10, pp. 402-407, 2022. 
[CrossRef] [Publisher Link]
[13] Shengyang Wu, and Yi Gao, “Machine Learning Approach to Analyze the Sentiment of Airline Passengers’ Tweets,” Transportation Research Record: Journal of the Transportation Research Board, 2023. 
[CrossRef] [Google Scholar] [Publisher Link]
[14] Umer Hanif, Safiullah Khan, and Muhammad Hassan, “Sentiment Analysis Through Machine Learning Approach by Applying Random Forest Algorithm on Airline & IMDB Tweets,” International Journal of Computational and Innovative Sciences, vol. 1, no. 3, pp. 1-11, 2023. 
[Publisher Link]
[15] Mohammed Matuq Ashi, Muazzam Ahmed Siddiqui, and Farrukh Nadeem, “Pre-Trained Word Embeddings for Arabic Aspect-Based Sentiment Analysis of Airline Tweets,” Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, vol. 845, pp. 241-251, 2019. 
[CrossRef] [Google Scholar] [Publisher Link]
[16] Rajat Yadu, and Ragini Shukla, “A Hybrid Model Integrating Adaboost Approach for Sentimental Analysis of Airline Tweets,” Revue d'Intelligence Artificielle, vol. 36, no. 4, pp. 519-528, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[17] M. Avinash, and E. Sivasankar, “A Study of Feature Extraction Techniques for Sentiment Analysis,” Emerging Technologies in Data Mining and Information Security, vol. 814, pp. 475-486, 2019. 
[CrossRef] [Google Scholar] [Publisher Link]
[18] Prajwal Madhusudhana Reddy, “Conducting Sentiment Analysis on Twitter Tweets to Predict the Outcomes of the Upcoming Karnataka State Elections,” SSRG International Journal of Computer Science and Engineering, vol. 10, no. 6, pp. 22-35, 2023. 
[CrossRef] [Publisher Link]
[19] İlhan Tarımer, Adil Çoban, and Arif Emre Kocaman, “Sentiment Analysis on IMDB Movie Comments and Twitter Data by Machine Learning and Vector Space Techniques,” arXiv, pp. 1-8, 2019. 
[CrossRef] [Google Scholar] [Publisher Link]
[20] Ubaid Mohamed Dahir, and Faisal Kevin Alkindy, “Utilising Machine Learning for Sentiment Analysis of IMDB Movie Review Data,” International Journal of Engineering Trends and Technology, vol. 71, no. 5, pp. 18-26, 2023. 
[CrossRef] [Google Scholar] [Publisher Link]
[21] El Barakaz Fatima et al., “Minimizing the Overlapping Degree to Improve Class Imbalanced Learning under Sparse Feature Selection: Application to Fraud Detection,” IEEE Access, vol. 9, pp. 28101-28110, 2021. 
[CrossRef] [Google Scholar] [Publisher Link]
[22] Richard Socher et al., “Parsing Natural Scenes and Natural Language with Recursive Neural Networks,” Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 129-136, 2011.
[Google Scholar] [Publisher Link]