Improved Hybrid Tuning Mel Frequency Cepstral Coefficients with Ant Colony Optimization, and Long Short Term Memory on Speech Hoarseness Detection

Noraziahtulhidayu Kamarudin; SAR Al Haddad

doi:10.14445/23488379/IJEEE-V12I9P112

Improved Hybrid Tuning Mel Frequency Cepstral Coefficients with Ant Colony Optimization, and Long Short Term Memory on Speech Hoarseness Detection

International Journal of Electrical and Electronics Engineering

Volume 12 Issue 9

Year of Publication : 2025

Authors : Noraziahtulhidayu Kamarudin, SAR Al Haddad

10.14445/23488379/IJEEE-V12I9P112

How to Cite?

Noraziahtulhidayu Kamarudin, SAR Al Haddad, "Improved Hybrid Tuning Mel Frequency Cepstral Coefficients with Ant Colony Optimization, and Long Short Term Memory on Speech Hoarseness Detection," SSRG International Journal of Electrical and Electronics Engineering, vol. 12, no. 9, pp. 119-127, 2025. Crossref, https://doi.org/10.14445/23488379/IJEEE-V12I9P112

Abstract:

Hoarseness speech detection through machine learning has been discussed quite extensively. However, not many people are trying to apply with different datasets and identify the type of algorithm that would be able to produce high accuracy, with the appropriate precision, recall, and F1-score. Two types of datasets are used in this study, including the Kaggle Speech dataset and the Saarbrucken Voice Dataset (SVD). The disadvantages of the Mel Frequency Cepstral Coefficient that affect the accuracy rate are overcome by using feature selection techniques, pitch features, and the selection of appropriate coefficients. From this technique, the accuracy rate has increased, especially using the selection of different coefficient parameters and the feature selection technique. Through this study, the increase in accuracy and increased performance metrics show the advantages of machine learning techniques in identifying hoarse and normal voices, especially in cancer patients.

Keywords:

Speech hoarseness, Normal, Hoarse speech, Ant colony optimization, Long short-term memory, Feature selection, Feature vector.

References:

[1] Ariel Roitman et al., “Harnessing Machine Learning in Diagnosing Complex Hoarseness Cases,” American Journal of Otolaryngology, vol. 46, no. 1, pp. 1-6, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Xin Nie et al., “Laryngeal Cancer Diagnosis Based on Improved YOLOv8 Algorithm,” Machine Learning: Science and Technology, vol. 3, no. 1, pp. 1-14, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[3] HyunBum Kim et al., “Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy,” Journal of Clinical Medicine, vol. 9, no. 11, pp. 1-15, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Alejandro R. Marrero-Gonzalez et al., “Application of Artificial Intelligence in Laryngeal Lesions: A Systematic Review and Meta-Analysis,” European Archives of Oto-Rhino-Laryngology, vol. 282, no. 3, pp. 1543-1555, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Saska Tirronen, Sudarsana Reddy Kadiri, and Paavo Alku, “The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection,” Journal of Voice, vol. 38, no. 5, 975-982, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Danilo Rangel Arruda Leite, Ronei Marcos de Moraes, and Leonardo Wanderley Lopes, “Different Performances of Machine Learning Models to Classify Dysphonic and Non-Dysphonic Voices,” Journal of Voice, vol. 39, no. 3, pp. 577-590, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Rumana Islam, Esam Abdel-Raheem, and Mohammed Tarique, “Voice Pathology Detection Using Convolutional Neural Networks with Electroglottographic (EGG) and Speech Signals,” Computer Methods and Programs in Biomedicine Update, vol. 2, pp. 1-13, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] N.P. Narendra, and Paavo Alku, “Glottal Source Information for Pathological Voice Detection,” IEEE Access, vol. 8, pp. 67745-67755, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Saska Tirronen, Sudarsana Reddy Kadiri, and Paavo Alku, “The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection,” Journal of Voice, vol. 38, no. 5, pp. 975-982 ,2024 .
[CrossRef] [Google Scholar] [Publisher Link]
[10] Vyom Verma et al., “A Novel Hybrid Model Integrating MFCC and Acoustic Parameters for Voice Disorder Detection,” Scientific Reports, vol. 13, no. 1, pp. 1-17, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Sneha Basak et al., “Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms Tools and Systems,” CMES-Computer Modeling in Engineering and Sciences, vol. 135, no. 2, pp. 1053-1089, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Zhaopeng Qian, and Kejing Xiao, “A Survey of Automatic Speech Recognition for Dysarthric Speech,” Electronics, vol. 12, no. 20, pp. 1-23, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Hyun-Bum Kim et al., “Classification of Laryngeal Diseases Including Laryngeal Cancer, Benign Mucosal Disease, and Vocal Cord Paralysis by Artificial Intelligence using Voice Analysis,” Scientific Report, vol. 14, no. 1, pp. 1-13, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Yuyang Yan et al., “Optimizing MFCC Parameters for the Automatic Detection of Respiratory Diseases,” Applied Acoustics, vol. 228, pp. 1-9, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Mohamed Cherif Amara Korba et al., “Improved Laryngeal Pathology Detection based on Bottleneck Convolutional Networks and MFCC,” IEEE Access, vol. 12, pp. 124801-124815, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Tuan D. Pham et al., “Diagnosis of Pathological Speech with Streamlined Features for Long Short-Term Memory Learning,” Computers in Biology and Medicine, vol. 170, pp. 1-14, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Tuan D. Pham, “Time-Frequency Time-Space LSTM for Robust Classification of Physiological Signals,” Scientific Reports, vol. 11, no. 1, pp. 1-11, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Nuha Qais Abdulmajeed, Belal Al-Khateeb, and Mazin Abed Mohammed, “Voice Pathology Identification System using a Deep Learning Approach based on Unique Feature Selection Sets,” Expert System, vol. 42, no. 1, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Rehman, Mujeeb Ur et al., “Voice Disorder Detection using Machine Learning Algorithms: An Application in Speech and Language Pathology,” Engineering Applications of Artificial Intelligence, vol. 133, pp. 1-16, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Mohammed Zakariah, Muna Al-Razgan, and Taha Alfakih, “Pathological Voice Classification using MEEL Features and SVM-TabNet Model,” Speech Communication,” vol. 162, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

IJEEE MENUS

Call for Paper - Upcoming Issues

Improved Hybrid Tuning Mel Frequency Cepstral Coefficients with Ant Colony Optimization, and Long Short Term Memory on Speech Hoarseness Detection

How to Cite?

Abstract:

Keywords:

References: