Machine Learning Based Assistance to Healthcare Professionals in Disease Prediction and Classification Using Basic Patient Profile

International Journal of Electrical and Electronics Engineering
© 2024 by SSRG - IJEEE Journal
Volume 11 Issue 4
Year of Publication : 2024
Authors : Prachi Palsodkar, Deepti Khurge, Ashish Bhagat, Prasanna Palsodkar, P.K. Rajani, Varsha Bendre
pdf
How to Cite?

Prachi Palsodkar, Deepti Khurge, Ashish Bhagat, Prasanna Palsodkar, P.K. Rajani, Varsha Bendre, "Machine Learning Based Assistance to Healthcare Professionals in Disease Prediction and Classification Using Basic Patient Profile," SSRG International Journal of Electrical and Electronics Engineering, vol. 11,  no. 4, pp. 140-150, 2024. Crossref, https://doi.org/10.14445/23488379/IJEEE-V11I4P115

Abstract:

Patient profile is critical for medical practitioners, clinicians, and researchers performing clinical evaluations, research studies, and epidemiological investigations. Analyzing patient data provides insights into symptom prevalence and trends across diverse medical illnesses, which aids in trend detection, diagnosis, treatment, and public health improvement. This work investigates the Machine Learning (ML) life cycle, which includes data balancing, feature analysis, K-fold cross-validation, and hyperparameter tuning, to develop classification models for predicting disease presence or absence. Accuracy, Recall, F1 score, Area under the Curve (AUC), and the Jaccard Index are measures used to evaluate ML classifiers like Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Boosting Classifier models. Gradient Boost emerges as the best-performing model, blending performance with computational economy, making it ideal for this classification challenge. This comprehensive approach enhances the understanding of illness causes, facilitates personalized treatment, and informs preventive measures.

Keywords:

Machine Learning, Patient profile, Predicting disease, Support Vector Machine, RF, DT.

References:

[1] Mojdeh Rastgoo et al., “Tackling the Problem of Data Imbalancing for Melanoma Classification,” Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies, vol. 2, pp. 32-39, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Jundong Li et al., “Feature Selection: A Data Perspective,” ACM Computing Surveys, vol. 50, no. 6, pp. 1-45, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Tzu-Tsung Wong, and Po-Yang Ye, “Reliable Accuracy Estimates from k-Fold Cross Validation,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 8, pp. 1586-1594, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Li Yang, and Abdallah Shami, “On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice,” Neurocomputing, vol. 415, pp. 295-316, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Vikramaditya Jakkula, “Tutorial on Support Vector Machine (SVM),” School of EECS, Washington State University, vol. 37, 2006.
[Google Scholar] [Publisher Link]
[6] Michele Fratello, and Roberto Tagliaferri, “Decision Trees and Random Forests,” Encyclopedia of Bioinformatics and Computational Biology, vol. 1, pp. 374-383, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Rémi Bardenet et al., “Collaborative Hyperparameter Tuning,” Proceedings of the 30th International Conference on Machine Learning, vol. 28, no. 2, pp. 199-207, 2013.
[Google Scholar] [Publisher Link]
[8] P. McCullagh, and John A. Nelder, Generalized Linear Models, 2nd ed., Routledge, 1989.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Sara Tehranipoor, Nima Karimian, and Jack Edmonds, “Breaking AES-128: Machine Learning-Based SCA, under Different Scenarios and Devices,” 2023 IEEE International Conference on Cyber Security and Resilience (CSR), Venice, Italy, pp. 564-571, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Alex J. Smola, and Bernhard Schölkopf, “A Tutorial on Support Vector Regression,” Statistics and Computing, vol. 14, pp. 199-222, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Trevor Hastie, Robert Tibshirani, and Jerome Friedman, Elements of Statistical Learning - Data Mining, Inference, and Prediction, 2nd ed., Springer New York, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Paritosh Jadhao et al., “Prediction of Early Stage Alzheimer’s Using Machine Learning Algorithm,” 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, pp. 1-5, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Dina Elreedy, and Amir F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for Handling Class Imbalance,” Information Sciences, vol. 505, pp. 32-64, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Laksika Tharmalingam, Disease Symptoms and Patient Profile Dataset, Kaggle, 2024. [Online]. Available: https://www.kaggle.com/datasets/uom190346a/disease-symptoms-and-patient-profile-dataset
[15] Xuchun Wang et al., “Exploratory Study on Classification of Diabetes Mellitus through a Combined Random Forest Classifier,” BMC Medical Informatics and Decision Making, vol. 21, no. 1, pp. 1-14, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Scikit Learn, Decision Tree Classifier. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
[17] K. VijiyaKumar et al., “Random Forest Algorithm for the Prediction of Diabetes,” 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN), Pondicherry, India, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Carlos Fernandez-Lozano et al., “Random Forest-Based Prediction of Stroke Outcome,” Scientific Reports, vol. 11, pp. 1-12, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Derara Duba Rufo et al., “Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM), Diagnostic, vol. 11, no. 9, pp. 1-14, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Seung-Bo Lee et al., “Predicting Parkinson’s Disease Using Gradient Boosting Decision Tree Models with Electroencephalography Signals,” Parkinsonism and Related Disorders, vol. 95, pp. 77-85, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[21] K. Sudharani, T.C. Sarma, and K. Satya Prasad, “Brain Stroke Detection Using k-Nearest Neighbor and Minimum Mean Distance Technique,” 2015 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), Kumaracoil, India, pp. 770-776, 2015.
[CrossRef] [Google Scholar] [Publisher Link]