Analysis of Offensive Data over Multi-Source Social Media Environment Using Modified Random Forest Algorithm

International Journal of Electronics and Communication Engineering
© 2023 by SSRG - IJECE Journal
Volume 10 Issue 9
Year of Publication : 2023
Authors : Uma Maheswari. V, R Priya
pdf
How to Cite?

Uma Maheswari. V, R Priya, "Analysis of Offensive Data over Multi-Source Social Media Environment Using Modified Random Forest Algorithm," SSRG International Journal of Electronics and Communication Engineering, vol. 10,  no. 9, pp. 63-71, 2023. Crossref, https://doi.org/10.14445/23488549/IJECE-V10I9P107

Abstract:

The widespread usage of social media platforms has resulted in an increasing volume of offensive content, posing significant challenges to maintaining a safe and respectful online environment. This research presents an analysis of offensive data over the social media environment using a modified Random Forest algorithm. The proposed modification to the traditional Random Forest algorithm incorporates a Weighted class Random Forest (WRF) to enhance model diversity and robustness. An algorithm utilizes weighted classes during training to address the inherent class imbalance in offensive data. By assigning higher weights to offensive content, the model prioritizes accurately identifying offensive posts, comments, and messages. This paper used the Twitter and Reddit dataset of multi-source social media content, labeled for offensive and non-offensive content, to train and validate the modified Random Forest model. Our proposed model is compared with Decision Tree (DT), Extreme-Gradient Boosting (XGBoost), Multi-layer Perceptron (MLP), K-Nearest Neighbors (KNN), and Traditional Random Forest (RF) algorithms in machine learning. A number of performance metrics are used to assess the model's effectiveness in dealing with offensive data, including accuracy, recall, precision, specificity, and the F1-score. The results demonstrate that the modified Random Forest algorithm outperforms better than other machine learning algorithms. Moreover, the model shows improved resilience to variations in offensive language and context, making it more suitable for real-world applications.

Keywords:

Social media, Offensive data, Content moderation, Machine learning, Modified random forest algorithm, Weighted class Random Forest.

References:

[1] Mst Shapna Akter et al., “Deep Learning Approach for Classifying the Aggressive Comments on Social Media: Machine Translated Data Vs Real Life Data,” IEEE International Conference on Big Data, pp. 5646-5655, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[2] Hitesh Kumar Sharma, K. Kshitiz, and Shailendra, “NLP and Machine Learning Techniques for Detecting Insulting Comments on Social Networking Platforms,” International Conference on Advances in Computing and Communication Engineering, IEEE, pp. 265-272, 2018. 
[CrossRef] [Google Scholar] [Publisher Link]
[3] Mohammed Ali Al-Garadi et al., “Predicting Cyberbullying on Social Media in the Big Data Era Using Machine Learning Algorithms: Review of Literature and Open Challenges,” IEEE Access, vol. 7, pp. 70701-70718, 2019. 
[CrossRef] [Google Scholar] [Publisher Link]
[4] Sunil Saumya, Abhinav Kumar, and Jyoti Prakash Sing, “Offensive Language Identification in Dravidian Code Mixed Social Media Text,” Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 36-45, 2021. 
[Google Scholar] [Publisher Link]
[5] Kazi Saeed Alam, Shovan Bhowmik, and Priyo Ranjan Kundu Prosun, “Cyberbullying Detection: An Ensemble Based Machine Learning Approach,” Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks, IEEE, pp. 710-715, 2021. 
[CrossRef] [Google Scholar] [Publisher Link]
[6] Fatemah Husain, “Arabic Offensive Language Detection Using Machine Learning and Ensemble Machine Learning Approaches,” arXiv, 2020. 
[CrossRef] [Google Scholar] [Publisher Link]
[7] Byungdae An, and Yongmoo Suh, “Identifying Financial Statement Fraud with Decision Rules Obtained from Modified Random Forest,” Data Technologies and Applications, vol. 54, no. 2, pp. 235-255, 2020. 
[CrossRef] [Google Scholar] [Publisher Link]
[8] Dong-Sheng Cao et al., “In Silico Classification of Human Maximum Recommended Daily Dose Based on Modified Random Forest and Substructure Fingerprint,” Analytica chimica acta, vol. 692, no. 1-2, pp. 50-56, 2011. 
[CrossRef] [Google Scholar] [Publisher Link]
[9] Robert Bryll, Ricardo Gutierrez-Osuna, and Francis Quek, “Attribute Bagging: Improving Accuracy of Classifier Ensembles by Using Random Feature Subsets,” Pattern Recognition, vol. 36, no. 6, pp. 1291-1302, 2003. 
[CrossRef] [Google Scholar] [Publisher Link]
[10] Jasmine Shaikh, and Rupali Patil, “Fake News Detection Using Machine Learning,” IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security, pp. 1-5, 2020. 
[CrossRef] [Google Scholar] [Publisher Link]
[11] Raju Kumar, and Aruna Bhat, “A Study of Machine Learning-Based Models for Detection, Control, and Mitigation of Cyberbullying in Online Social Media,” International Journal of Information Security, vol. 21, no. 6, pp. 1409-1431, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[12] Mitushi Raj et al., “An Application to Detect Cyberbullying Using Machine Learning and Deep Learning Techniques,” SN Computer Science, vol. 3, no. 5, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[13] Shaokang Cai et al., “An Reinforcement Learning-Based Speech Censorship Chatbot System,” The Journal of Supercomputing, vol. 78, no. 6, pp. 8751-8773, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[14] Amit Sheth, Valerie L. Shalin, and Ugur Kursuncu, “Defining and Detecting Toxicity on Social Media: Context and Knowledge are Key,” Neurocomputing, vol. 490, pp. 312-318, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[15] Nabil Badri, Ferihane Kboubi, and Anja Habacha Chaibi, “Combining Fasttext and Glove Word Embedding for Offensive and Hate Speech Text Detection,” Procedia Computer Science, vol. 207, pp. 769-778, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[16] Malliga Subramanian et al., “Offensive Language Detection in Tamil Youtube Comments by Adapters and Cross-Domain Knowledge Transfer,” Computer Speech & Language, vol. 76, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[17] Fatima Shannaq et al., “Offensive Language Detection in Arabic Social Networks Using Evolutionary-Based Classifiers Learned from Fine-Tuned Embeddings,” IEEE Access, vol. 10, pp. 75018-75039, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[18] Tanmoy Chakraborty, and Sarah Masud, “Nipping in the Bud: Detection, Diffusion and Mitigation of Hate Speech on Social Media,” ACM SIGWEB Newsletter, pp. 1-9, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[19] Sneha Chinivar et al., “Online Offensive Behaviour in Socialmedia: Detection Approaches, Comprehensive Review and Future Directions,” Entertainment Computing, vol. 45, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[20] Amit Kumar Balyan et al., “A Hybrid Intrusion Detection Model Using EGA-PSO and Improved Random Forest Method,” Sensors, vol. 22, no. 16, pp. 1-20, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[21] Caihui Liu et al., “An Improved Decision Tree Algorithm Based on Variable Precision Neighborhood Similarity,” Information Sciences, vol. 615, pp. 152-166, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[22] Nasir Jalal et al., “A Novel Improved Random Forest for Text Classification Using Feature Ranking and Optimal Number of Trees,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 6, pp. 2733-2742, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[23] Tao Wu et al., “Intrusion Detection System Combined Enhanced Random Forest with SMOTE Algorithm,” EURASIP Journal on Advances in Signal Processing, pp. 1-20, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[24] Hanane Zermane, and Abbes Drardja, “Development of an Efficient Cement Production Monitoring System Based on the Improved Random Forest Algorithm,” The International Journal of Advanced Manufacturing Technology, vol. 120, no. 3-4, pp. 1853-1866, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]
[25] Mahsa Hosseinpour et al., “A Hybrid High‐Order Type‐2 FCM Improved Random Forest Classification Method for Breast Cancer Risk Assessment,” Applied Mathematics and Computation, vol. 424, 2022. 
[CrossRef] [Google Scholar] [Publisher Link]