Hybrid Sampling Approach for Multiclass Imbalanced Data

International Journal of Electronics and Communication Engineering
© 2026 by SSRG - IJECE Journal
Volume 13 Issue 1
Year of Publication : 2026
Authors : Madhura Prabha, Sasikala
pdf
How to Cite?

Madhura Prabha, Sasikala, "Hybrid Sampling Approach for Multiclass Imbalanced Data," SSRG International Journal of Electronics and Communication Engineering, vol. 13,  no. 1, pp. 18-26, 2026. Crossref, https://doi.org/10.14445/23488549/IJECE-V13I1P102

Abstract:

Machine learning algorithms may create skewed results in classification. In some of the scenarios, like machine fault detection, fraud detection, and disease diagnosis, the intent class or the focused class instances may be very few when compared to the other instances or non-focused instances. In these cases, the classifier is trained based on the other non-focused classes and creates a skewed result. This will be high for false negatives. This paper proposes Stratified Near Miss Undersampling with Deep Neural Network (DeepSNMU) to investigate the imbalanced dataset using the Spark framework. The proposed DeepSNMU method uses IQR outlier detection for pre-processing, near-miss undersampling for balancing the dataset, and a Deep Neural Network for classification. Experiments were conducted using four highly imbalanced datasets from the KEEL repository. These datasets have multiple target classes, which are experimented with using DeepSNMU. In this research work, DeepSNMU produces high classification accuracy. The result of DeepSNMU is compared with existing balancing techniques and classifiers. The outcome of this research work has shown higher accuracy in predicting the minority intent class than the existing methods.

Keywords:

Deep Neural Network, Hybridization, Imbalanced data, Multi-Class, Near Miss Undersampling.

References:

[1] Hasan Ahmed Salman, Ali Kalakech, and Amani Steiti, “Random Forest Algorithm Overview,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69-79, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Thomas N. Rincy, and Roopam Gupta, “Ensemble Learning Techniques and Its Efficiency in Machine Learning: A Survey,” Proceedings of the IEEE 2nd International Conference on Data, Engineering and Applications, Bhopal, India, pp. 1-6, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Mahendra Sahare, and Hitesh Gupta, “A Review of Multi-Class Classification for Imbalanced Data,” International Journal of Advanced Computer Research, vol. 2, no. 5, pp. 163-168, 2012.
[Google Scholar] [Publisher Link]
[4] Miriam Seoane Santos et al., “On the Joint-Effect of Class Imbalance and Overlap: A Critical Review,” Artificial Intelligence Review, vol. 55, no. 8, pp. 6207-6275, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Lu Gao, Pan Lu, and Yihao Ren, “A Deep Learning Approach for Imbalanced Crash Data in Predicting Highway-Rail Grade Crossings Accidents,” Reliability Engineering & System Safety, vol. 216, pp. 1-21, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Guo Haixiang et al., “Learning from Class-Imbalanced Data: Review of Methods and Applications,” Expert Systems with Applications, vol. 73, pp. 220-239, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Wuxing Chen et al., “A Survey on Imbalanced Learning: Latest Research, Applications and Future Directions,” Artificial Intelligence Review, vol. 57, no. 6, pp. 1-51, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Malti Bansal, Apoorva Goyal, and Apoorva Choudhary, “A Comparative Analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory Algorithms in Machine Learning,” Decision Analytics Journal, vol. 3, pp. 1-21, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[9] C. M. Van der Walt, and E. Barnard, “Data Characteristics that Determine Classifier Performance,” SAIEE Africa Research Journal, vol. 98, no. 3, pp. 87-93, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Le Wang et al., “Review of Classification Methods on Unbalanced Data Sets,” IEEE Access, vol. 9, pp. 64606-64628, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Tawfiq Hasanin et al., “Severely Imbalanced Big Data Challenges: Investigating Data Sampling Approaches,” Journal of Big Data, vol. 6, no. 1, pp. 1-25, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Muhammad Ibnu Choldun Rachmatullah, “The Application of Repeated SMOTE for Multi Class Classification on Imbalanced Data,” MATRIK: Journal of Management, Information Technology and Computer Engineering, vol. 22, no. 1, pp. 13-24, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Eslam Mohsen Hassib et al., “An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance,” IEEE Access, vol. 7, pp. 170774-170795, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Kun Jiang, Jing Lu, and Kuiliang Xia, “A Novel Algorithm for Imbalance Data Classification Based on Genetic Algorithm Improved SMOTE,” Arabian Journal for Science and Engineering, vol. 41, pp. 3255-3266, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Nahla B. Abdel-Hamid, “A Dynamic Spark-Based Classification Framework for Imbalanced Big Data,” Journal of Grid Computing, vol. 16, pp. 607-626, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[16] M. Mohammed Al Sameer, T. Prasanth, and R. Anuradha, “Rapid Forest Cover Detection using Ensemble Learning,” Proceedings of the International Virtual Conference on Industry 4.0: (IVCI4.0), pp. 181-190, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Chih-Fong Tsai, Kuan-Chen Chen, and Wei-Chao Lin, “Feature Selection and Its Combination with Data Over-Sampling for Multi-Class Imbalanced Datasets,” Applied Soft Computing, vol. 153, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Alberto Cano, and Bartosz Krawczyk, “ROSE: Robust Online Self-Adjusting Ensemble for Continual Learning on Imbalanced Drifting Data Streams,” Machine Learning, vol. 111, no. 5, pp. 1-32, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Jiaoni Zhang et al., “A New Oversampling Approach Based Differential Evolution on the Safe Set for Highly Imbalanced Datasets,” Expert Systems with Applications, vol. 234, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Abdul Quadir Md et al., “Enhanced Preprocessing Approach using Ensemble Machine Learning Algorithms for Detecting Liver Disease,” Biomedicines, vol. 11, no. 2, pp. 1-23, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[21] H.P. Vinutha, B. Poornima, and B.M. Sagar, “Detection of Outliers Using Interquartile Range Technique from Intrusion Dataset,” Advances in Intelligent Systems and Computing, vol. 701, pp. 511-518, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Amerah Alabrah, “An Improved CCF Detector to Handle the Problem of Class Imbalance with Outlier Normalization Using IQR Method,” Sensors, vol. 23, no. 9, pp. 1-14, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Zhuo Wu et al., “Stratified Random Sampling for Neural Network Test Input Selection,” Information and Software Technology, vol. 165, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Zhongqiang Sun et al., “Undersampling Method Based on Minority Class Density for Imbalanced Data,” Expert Systems with Applications, vol. 249, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Joffrey L. Leevy et al., “A Survey on Addressing High-Class Imbalance in Big Data,” Journal of Big Data, vol. 5, no. 1, pp. 1-30, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Brahim Jabir et al., “Ensemble Partition Sampling (EPS) for Improved Multi-Class Classification,” IEEE Access, vol. 11, pp. 48221-48235, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Jafar Tanha et al., “Boosting Methods for Multi-Class Imbalanced Data Classification: An Experimental Review,” Journal of Big Data, vol. 7, no. 1, pp. 1-47, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Iwo Naglik, and Mateusz Lango, “GMMSampling: A New Model-Based, Data Difficulty-Driven Resampling Method for Multi-Class Imbalanced Data,” Machine Learning, vol. 113, no. 8, pp. 5183-5202, 2024.
[CrossRef] [Google Scholar] [Publisher Link]