TSCE and CD-CATL Driven Framework for Robust and Real-Time Voice Disorder Detection and Classification

S. Navaneethan; D. J. Ashpin Pabi; C. Ambika Bhuvaneswari

doi:10.14445/23488549/IJECE-V12I8P132

TSCE and CD-CATL Driven Framework for Robust and Real-Time Voice Disorder Detection and Classification

International Journal of Electronics and Communication Engineering

Volume 12 Issue 8

Year of Publication : 2025

Authors : S. Navaneethan, D. J. Ashpin Pabi, C. Ambika Bhuvaneswari, M. Nalini

10.14445/23488549/IJECE-V12I8P132

How to Cite?

S. Navaneethan, D. J. Ashpin Pabi, C. Ambika Bhuvaneswari, M. Nalini, "TSCE and CD-CATL Driven Framework for Robust and Real-Time Voice Disorder Detection and Classification," SSRG International Journal of Electronics and Communication Engineering, vol. 12, no. 8, pp. 375-383, 2025. Crossref, https://doi.org/10.14445/23488549/IJECE-V12I8P132

Abstract:

The creation of non-invasive methods for precise and non-invasive diagnosis of vocal disorders in clinical speech diagnostics is tremendously challenging owing to the tremendous variation in demographic, linguistic, and acoustic features. In this paper, a powerful deep learning-based system is proposed that is capable of identifying and classifying vocal fold defects using the Aachen Voice Pathology Database (AVPD) using Temporal Spectro-Context Encoding (TSCE) and Cross-Domain Context-Aware Transfer Learning (CD-CATL). The dataset contains 388 annotated high-quality speech samples that cover a wide range of conditions, such as paralysis, edema, nodules, and polyps. The data are time-corrected following Gammatone-based spectrotemporal decomposition with dynamic time warping and short-time Fourier transform in the preprocessing pipeline. The TSCE module maintains phonatory dynamics while encoding local and distant acoustic interactions by employing dilated convolutions and multi-head attention. The system is learned to acquire domain-invariant features while maintaining disease-specific representations by combining memory-augmented transformer streams with multi-scale convolutional attention in the CD-CATL architecture. The model performs better than baseline CNN and RNN models on all standard evaluation measures, with a sensitivity of 97.81%, specificity of 98.56%, and an accuracy of 98.89%. The system is appropriate for telehealth use with its real-time inference enabled by its low-latency optimized deployment with ONNX and TensorRT. The suggested approach seems to have the potential for providing clinically sound, scalable, and objective voice disorder screening for use across a range of low-resource health care environments.

Keywords:

Voice pathology detection, Deep learning, Temporal spectro-context encoding, Transfer learning, Convolutional attention, Transformer networks, Gammatone-STFT, Telehealth diagnostics.

References:

[1] R.W. Schafer, “Scientific Bases of Human-Machine Communication by Voice,” Proceedings of the National Academy of Sciences, vol. 92, no. 22, pp. 9914-9920, 1995.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Hamzeh Ghasemzadeh et al., “Detection of Vocal Disorders Based on Phase Space Parameters and Lyapunov Spectrum,” Biomedical Signal Processing and Control, vol. 22, pp. 135-145, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Rumana Islam, Mohammed Tarique, and Esam Abdel-Raheem, “A Survey on Signal Processing Based Pathological Voice Detection Techniques,” IEEE Access, vol. 8, pp. 66749-66776, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Mazin Abed Mohammed et al., “Voice Pathology Detection and Classification Using Convolutional Neural Network Model,” Applied Sciences, vol. 10, no. 11, pp. 1-13, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Judith R. Smitheran, and Thomas J. Hixon, “A Clinical Method for Estimating Laryngeal Airway Resistance during Vowel Production,” Journal of Speech and Hearing Disorders, vol. 46, no. 2, pp. 138-146, 1981.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Abdul-Latif Hamdan, Robert Thayer Sataloff, and Mary J. Hawkshaw, Physical Examination, Office-Based Laryngeal Surgery, Springer, Cham, pp. 41-58, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Clark A. Rosen, and Thomas Murry, “Diagnostic Laryngeal Endoscopy,” Otolaryngologic Clinics of North America, vol. 33, no. 4, pp. 751-757, 2000.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Raquel Buzelin Nunes et al., “Clinical Diagnosis and Histological Analysis of Vocal Nodules and Polyps,” Brazilian Journal of Otorhinolaryngology, vol. 79, pp. 434-440, 2013.
[Google Scholar] [Publisher Link]
[9] John M. Wood, Theodore Athanasiadis, and Jacqui Allen, “Laryngitis,” Bmj, vol. 349, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[10] GW Zhu, F. Wang, and Liu, WG Liu, “Classification and Prediction of Outcome in Traumatic Brain Injury Based on Computed Tomographic Imaging,” Journal of International Medical Research, vol. 37, no. 4, pp. 983-995, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Peter N. Taylor et al., “Global Epidemiology of Hyperthyroidism and Hypothyroidism,” Nature Reviews Endocrinology, vol. 14, no. 5, pp. 301-316, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Alper Idrisoglu et al., “Applied Machine Learning Techniques to Diagnose Voice-Affecting Conditions and Disorders: Systematic Literature Review,” Journal of Medical Internet Research, vol. 25, 2023.
[Google Scholar] [Publisher Link]
[13] Carine W. Maurer, and Joseph R. Duffy, Functional Speech and Voice Disorders, Functional Movement Disorder, pp. 157-167, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Naren N. Venkatesan et al., “Abductor Paralysis after Botox Injection for Adductor Spasmodic Dysphonia,” The Laryngoscope, vol. 120, no. 6, pp. 1177-1180, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Evelyne Van Houtte, Kristiane Van Lierde, and Sofie Claeys, “Pathophysiology and Treatment of Muscle Tension Dysphonia: A Review of the Current Knowledge,” Journal of Voice, vol. 25, no. 2, pp. 202-207, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Madhu Keerthana Yagnavajjula et al., “Automatic Classification of Neurological Voice Disorders Using Wavelet Scattering Features,” Speech Communication, vol. 157, pp. 1-10, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[17] İsmail Cantürk, and Osman Günay, “Investigation of Scalograms with a Deep Feature Fusion Approach for Detection of Parkinson’s Disease,” Cognitive Computation, vol. 16, pp. 1198-1209, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Aditi Govindu, and Sushila Palwe, “Early Detection of Parkinson's Disease Using Machine Learning,” Procedia Computer Science, vol. 218, pp. 249-261, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Senjuti Rahman et al., “Classification of Parkinson’s Disease Using Speech Signal with Machine Learning and Deep Learning Approaches,” European Journal of Electrical Engineering and Computer Science, vol. 7, no. 2, pp. 20-27, 2023.
[Google Scholar] [Publisher Link]
[20] Vyom Verma et al., “A Novel Hybrid Model Integrating MFCC and Acoustic Parameters for Voice Disorder Detection,” Scientific Reports, vol. 13, no. 1, pp. 1-17, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Amel Ksibi et al., “Voice Pathology Detection Using a Two-Level Classifier Based on Combined CNN–RNN Architecture,” Sustainability, vol. 15, no. 4, pp. 1-18, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Raya Alshammri et al., “Machine Learning Approaches to Identify Parkinson's Disease Using Voice Signal Features,” Frontiers in Artificial Intelligence, vol. 6, pp. 1-8, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Rimah Amami et al., “A Robust Voice Pathology Detection System Based on the Combined BiLSTM–CNN Architecture,” Mendel Soft Computing Journal, vol. 29, no. 2, pp. 202-210, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Ji-Na Lee, and Ji-Yeoun Lee, “An Efficient SMOTE-Based Deep Learning Model for Voice Pathology Detection,” Applied Sciences, vol. 13, no. 6, pp. 1-16, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Ji-Yan Han et al., “Enhancing the Performance of Pathological Voice Quality Assessment System Through the Attention-Mechanism Based Neural Network,” Journal of Voice, vol. 39, no. 4, pp. 1033-1043, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

IJECE MENUS

Call for Paper - Upcoming Issues

TSCE and CD-CATL Driven Framework for Robust and Real-Time Voice Disorder Detection and Classification

How to Cite?

Abstract:

Keywords:

References: