Ensemble Model for Educational Data Mining based on Synthetic Minority Oversampling Technique

International Journal of Electronics and Communication Engineering |
© 2025 by SSRG - IJECE Journal |
Volume 12 Issue 9 |
Year of Publication : 2025 |
Authors : R. Manoharan, M. Subi Stalin, Ganesh Babu Loganathan, K.Venkateswaran |
How to Cite?
R. Manoharan, M. Subi Stalin, Ganesh Babu Loganathan, K.Venkateswaran, "Ensemble Model for Educational Data Mining based on Synthetic Minority Oversampling Technique," SSRG International Journal of Electronics and Communication Engineering, vol. 12, no. 9, pp. 219-234, 2025. Crossref, https://doi.org/10.14445/23488549/IJECE-V12I9P119
Abstract:
Educational Data Mining (EDM) is a growing field that applies data mining, statistical analysis, and machine learning techniques to analyze student-related data. Existing EDM approaches often rely on manual statistical methods, which are time-consuming and less adaptable to dynamic educational environments. This paper proposes a novel ensemble-based framework that integrates machine learning classifiers with statistical approaches for student performance classification to address these limitations. To improve predictive accuracy, the model combines multiple classifiers, including Decision Tree, Logistic Regression, Random Forest, Multilayer Perceptron, and K-Nearest Neighbor. Given the inherent class imbalance in educational data, the Synthetic Minority Oversampling Technique (SMOTE) balances the dataset and enhances classifier performance. The proposed model is evaluated using a real-world dataset comprising 6,807 student records collected from a technological college in India. Performance is assessed using eight evaluation metrics to identify the most effective configuration. Results demonstrate the model’s capability to deliver accurate and fair classification, aiding data-driven educational decision-making.
Keywords:
Educational Data Mining, Ensemble method, SMOTE, Machine Learning.
References:
[1] Niroj Sapkota et al., “Data Summarization Using Clustering and Classification: Spectral Clustering Combined with k-Means Using NFPH,” 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, pp. 146-151, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Haijun Li, and Qingping Lu, “K-CV Parameter Optimization Method in the Application of SVM Classification Data,” 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA), Beijing, China, pp. 25-29, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Sneha Chandra, and Maneet Kaur, “Creation of an Adaptive Classifier to Enhance the Classification Accuracy of Existing Classification Algorithms in the Field of Medical Data Mining,” 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, pp. 376-381, 2015.
[Google Scholar] [Publisher Link]
[4] Okfalisa et al., “Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification,”2017 2nd International Conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, pp. 294-298, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Yoga Pristyanto, Irfan Pratama, and Anggit Ferdita Nugraha, “Data Level Approach for Imbalanced Class Handling on Educational Data Mining Multiclass Classification,” 2018 International Conference on Information and Communications Technology, Yogyakarta, Indonesia, pp. 310-314, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Muhammet Sinan Başarslan, and İrem Düzdar Argun, “Classification of a Bank Data Set on Various Data Mining Platforms,” 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting, Istanbul, Turkey, pp. 1-4, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Elena Baralis, Silvia Chiusano, and Paolo Garza, “A Lazy Approach to Associative Classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 2, pp. 156-171, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Hamza Erol, Bala Mikat Tyoden, and Recep Erol, “Classification Performances Of Data Mining Clustering Algorithms For Remotely Sensed Multispectral Image Data,” 2018 Innovations in Intelligent Systems and Applications, Thessaloniki, Greece, pp. 1-4, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Lei Zuo, and Junfeng Guo, “Customer Classification of Discrete Data Concerning Customer Assets Based on Data Mining,” 2019 International Conference on Intelligent Transportation, Big Data & Smart City, Changsha, China, pp. 352-355, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Stamos T. Karamouzis, and Andreas Vrettos, “An Artificial Neural Network for Predicting Student Graduation Outcomes,” Proceedings of the World Congress on Engineering and Computer Science, San Francisco, USA, pp. 991-994, 2008.
[Google Scholar]
[11] Rosângela Marques de Albuquerque et al., “Using Neural Networks to Predict the Future Performance of Students,” 2015 International Symposium on Computers in Education (SIIE), Setubal, Portugal, pp. 109-113, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Samy Abu Naser et al., “Predicting Student Performance Using Artificial Neural Network: in the Faculty of Engineering and Information Technology,” International Journal of Hybrid Information Technology, vol. 8, no. 2, pp. 221-228, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Tismy Devasia, T.P. Vinushree, and Vinayak Hegde, “Prediction of students performance using Educational Data Mining,” 2016 International Conference on Data Mining and Advanced Computing, Ernakulam, India, pp. 91-95, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Zlatko J. Kovačić, “Early Prediction of Student Success: Mining Students Enrolment Data,” Proceedings of Informing Science & IT Education Conference, pp. 1-19, 2010.
[Google Scholar] [Publisher Link]
[15] Anal Acharya, and Devadatta Sinha, “Early Prediction of Students Performance using Machine Learning Techniques,” International Journal of Computer Applications, vol. 107, no. 1, pp. 37-43, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Farshid Marbouti, Heidi A. Diefes-Dux, and Krishna Madhavan, “Models for Early Prediction of at-Risk Students in a Course Using Standards-Based Grading,” Computers & Education, vol. 103, pp. 1-15, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Martin Hlosta, Zdenek Zdráhal, and Jaroslav Zendulka, “Ouroboros: Early Identification of at-Risk Students without Models Based on Legacy Data,” Proceedings of the Seventh International Learning Analytics & Knowledge Conference, Vancouver British Columbia Canada, pp. 6-15, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Vaibhav Kumar, and M.L. Garg, “Comparison of Machine Learning Models in Student Result Prediction,” International Conference on Advanced Computing Networking and Informatics, pp. 439-452, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Dorina Kabakchieva, “Predicting Student Performance by Using Data Mining Methods for Classification,” Cybernetics and Information Technologies, vol. 13, no. 1, pp. 61-72, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Paulo Cortez, and Alice Silva, “Using Data Mining to Predict Secondary School Student Performance,” Proceedings of 5th Annual Future Business Technology Conference, pp. 5-12, 2008.
[Google Scholar] [Publisher Link]
[21] Mushtaq Hussain et al., “Using Machine Learning to Predict Student Difficulties from Learning Session Data,” Artificial Intelligence Review, vol. 52, pp. 381-407, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Ya-Han Hu, Chia-Lun Lo, and Sheng-Pao Shih, “Developing Early Warning Systems to Predict Students’ Online Learning Performance,” Computers in Human Behavior, vol. 36, pp. 469-478, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Ahmed Mueen, Bassam Zafar, and Umar Manzoor, “Modeling and Predicting Students’ Academic Performance Using Data Mining Techniques,” I.J. Modern Education and Computer Science, vol. 8, no. 11, pp. 36-42, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Dech Thammasiri et al., “A Critical Assessment of Imbalanced Class Distribution Problem: The Case of Predicting Freshmen Student Attrition,”Expert Systems with Applications, vol. 41, no. 2, pp. 321-330, 2014.
[CrossRef] [Google Scholar] [Publisher Link]