A Hybrid Deep Learning Framework for Marathi Speech-Based Stress Detection Using GRU and Handcrafted Audio Features
| International Journal of Electronics and Communication Engineering |
| © 2026 by SSRG - IJECE Journal |
| Volume 13 Issue 2 |
| Year of Publication : 2026 |
| Authors : Smita S. Patil, Meena Chavan |
How to Cite?
Smita S. Patil, Meena Chavan, "A Hybrid Deep Learning Framework for Marathi Speech-Based Stress Detection Using GRU and Handcrafted Audio Features," SSRG International Journal of Electronics and Communication Engineering, vol. 13, no. 2, pp. 195-207, 2026. Crossref, https://doi.org/10.14445/23488549/IJECE-V13I2P115
Abstract:
Stress is one of the major factors affecting mental and physical health and requires the development of new non-invasive, early, and accessible detection methods. Prosodic and acoustic features indicative of emotional states can be employed for voice-based stress detection, which provides a non-invasive method. However, research on regional languages, such as Marathi, is scarce. This paper presented a hybrid deep learning-based framework for stress finding in Marathi speech that combines hand-tuned audio features with deep temporal representation using a Gated Recurrent Unit (GRU)-based network. The system applies pre-processing steps such as removing silent portions and reducing background noise in order to improve overall robustness. System extracted 19 handcrafted features, including MFCCs, Chroma, spectral contrast, zero-crossing rate, and spectral rolloff from each audio clip. Simultaneously, sequences of MFCCs are input to a GRU to model the temporal information. The outputs of the two feature branches are then concatenated and input into the fully connected layers to perform classification. The developed method demonstrated a 92% accuracy with F1-score of roughly 0.92 when examined in the regional Marathi language. This system has also been tested on CNN and RNN with accuracies of 84% and 78%, and the results show that the integration of statistical and temporal features with improved pre-processing leads to an improvement in stress detection performance, using which a scalable solution can be embedded for monitoring mental health in the context of a own-resource language.
Keywords:
Stress, MFCC, Marathi, CNN, Deep learning.
References:
[1] Zhichao Peng “Multi-Resolution Modulation-Filtered Cochleagram Feature for LSTM-based Dimensional Emotion Recognition from Speech,” Neural Networks, vol. 140, pp. 261-273, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[2] S. Vaikole et al., “Stress Detection through Speech Analysis using Machine Learning,” International Journal of Creative Research Thoughts, vol. 8, no. 5, pp. 1-6, 2020.
[Google Scholar] [Publisher Link]
[3] Himani Negi et al., “A Novel Approach for Depression Detection using Audio Sentiment Analysis,” Proceedings 4th International Conference Computers & Management (ICCM), pp. 43-46, 2018.
[Google Scholar]
[4] Lang He, and Cui Cao, “Automated Depression Analysis Using Convolutional Neural Networks from Speech,” Journal of Biomedical Informatics, vol. 83, pp. 103-111, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Savita Sondhi et al., “Vocal Indicators of Emotional Stress,” International Journal of Computer Applications, vol. 122, no. 15, pp. 1-16, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Fatma M. Talaat, and Rana Mohamed El-Balka, “Stress Monitoring using Wearable Sensors: IoT Techniques in Medical Field,” Neural Computing and Applications, vol. 35, pp. 18571-18584, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Suraj Tripathi et al., “Deep Learning based Emotion Recognition System Using Speech Features and Transcriptions,” arXiv preprint, pp. 1-12, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Ensar Arif Sağbaş, Serdar Korukoglu, and Serkan Ballı, “Real-time Stress Detection from Smartphone Sensor Data Using Genetic Algorithm-Based Feature Subset Optimization and K-Nearest Neighbor Algorithm,” Multimedia Tools and Applications, vol. 83, pp. 1-32, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Kevin Tomba et al., “Stress Detection Through Speech Analysis,” Proceedings of the 15th International Joint Conference on e-Business and Telecommunications, Porto, Portugal, vol. 1, pp. 394-398, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Rajib Sharma et al., “Empirical Mode Decomposition for Adaptive AM-FM Analysis of Speech,” Speech Communication, vol. 88, pp. 39-64, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Himanshu Churi et al., “A Deep Learning Approach for Depression Classification using Audio Features,” International Research Journal of Engineering and Technology, vol. 8, no. 3, pp. 2930-2935, 2021.
[Google Scholar] [Publisher Link]
[12] Bharati Borade, and R.R. Deshmukh, “Emotional Speech Recognition for Marathi Language,” Journal of Advanced Applied Scientific Research, vol. 6, no. 3, pp. 85-105, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Nidhi Kowtal, and Raviraj Joshi, “L3Cube-MahaEmotions: A Marathi Emotion Recognition Dataset with Synthetic Annotations using CoTR prompting and Large Language Models,” arXiv preprint, pp. 1-9, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Dipti D. Joshi, and M.B. Zalte, “Recognition of Emotion from Marathi Speech using MFCC and DWT Algorithms,” International Journal of Advanced Computer Engineering and Communication Technology, vol. 2, no. 2, pp. 59-63, 2013.
[Google Scholar] [Publisher Link]
[15] R. Shinde Ashok et al., “Emotion Recognition in Marathi Language by using Fast Fourier Transform,” International Journal of Computer Sciences and Engineering, vol. 7, no. 10, pp. 43-47, 2019.
[CrossRef] [Publisher Link]
[16] Akhilesh Ketkar et al., “Marathi Speech Emotion Recognition using Deep Learning Techniques,” Journal on Computer Hardware, Signal Processing, Embedded System and Networking, vol. 5, no. 1, pp. 1-4, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

10.14445/23488549/IJECE-V13I2P115