Enhancing Music Emotion Recognition with LSTM: Evaluating Various Embedding Techniques

International Journal of Electronics and Communication Engineering |
© 2025 by SSRG - IJECE Journal |
Volume 12 Issue 6 |
Year of Publication : 2025 |
Authors : Affreen Ara, Rekha V |
How to Cite?
Affreen Ara, Rekha V, "Enhancing Music Emotion Recognition with LSTM: Evaluating Various Embedding Techniques," SSRG International Journal of Electronics and Communication Engineering, vol. 12, no. 6, pp. 293-303, 2025. Crossref, https://doi.org/10.14445/23488549/IJECE-V12I6P123
Abstract:
The study investigates the application of Long Short-Term Memory (LSTM) networks for emotion classification in music lyrics. It focuses on the comparative effectiveness of various word embedding techniques. It evaluates the performance of static embeddings (GloVe, Word2Vec, FastText) versus contextual embeddings (BERT, Distil BERT) across three datasets: MER Lyrics, Mood Lyrics, and Combined Lyrics. Additionally, the study examines the role of stylistic and content-based features in enhancing classification accuracy. The results demonstrate that contextual embeddings considerably outperform static embeddings, achieving accuracy rates of up to 98% compared to 60% for static approaches. Moreover, combining multiple lyric datasets leads to improved model generalization. The findings show the potential of transformer-based models for advancing music emotion recognition. Future research will focus on optimizing large embedding models using techniques such as pruning, quantization, and distillation to enhance computational efficiency.
Keywords:
LSTM, Embedding, BERT, Emotion classification and Emotion.
References:
[1] Xin Rong, “Word2vec Parameter Learning Explained,” arXiv, pp. 1-21, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[2] M. Alfa Riza, and Novrido Charibaldi, “Emotion Detection in Twitter Social Media Using Long Short-Term Memory (LSTM) and Fast Text,” International Journal of Artificial Intelligence and Robotics, vol. 3, no. 1, pp. 15-26, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Zixiao Zhu, and Kezhi Mao, “Knowledge-Based BERT Word Embedding Fine-Tuning for Emotion Recognition,” Neurocomputing, vol. 552, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Jiaxin Ma et al., “Emotion Recognition Using Multimodal Residual LSTM Network,” Proceedings of the 27th ACM International Conference on Multimedia, Nice France, pp. 176-183, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Jeffrey Pennington, Richard Socher, and Christopher Manning, “GloVe: Global Vectors for Word Representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532-1543, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Loreto Parisi et al., “Exploiting Synchronized Lyrics and Vocal Features for Music Emotion Detection,” arXiv, pp. 1-8, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Cong Jin et al., “Attention-Based Bi-DLSTM for Sentiment Analysis of Beijing Opera Lyrics,” Wireless Communications and Mobile Computing, vol. 2022, pp. 1-8, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Xiaoguang Jia, “Music Emotion Classification Method Based on Deep Learning and Improved Attention Mechanism,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1-8, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Konstantinos Pyrovolakis, Paraskevi Tzouveli, and Giorgos Stamou, “Multi-Modal Song Mood Detection with Deep Learning,” Sensors, vol. 22, no. 3, pp. 1-23, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Rémi Delbouys et al., “Music Mood Detection Based on Audio and Lyrics with Deep Neural Net,” Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, pp. 1-6, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Elangel Neilea Shaday, Ventje Jeremias Lewi Engel, and Hery Heryanto, “Application of the Bidirectional Long Short-Term Memory Method with Comparison of Word2Vec, GloVe, and FastText for Emotion Classification in Song Lyrics,” Procedia Computer Science, vol. 254, pp. 137-146, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Anshul Wadhawan, and Akshita Aggarwal, “Towards Emotion Recognition in Hindi-English Code-Mixed Data: A Transformer Based Approach,” arXiv, pp. 1-8, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Yudhik Agrawal, Ramaguru Guru Ravi Shanker, and Vinoo Alluri, “Transformer-Based Approach towards Music Emotion Recognition from Lyrics,” Proceedings, Part II 43rd European Conference on IR Research, Advances in Information Retrieval, pp. 167-75, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Erion Çano, and Maurizio Morisio, “MoodyLyrics: A Sentiment Annotated Lyrics Dataset,” Proceedings of the 2017 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence, Hong Kong, pp. 118-124, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Ricardo Malheiro et al., “Emotionally-Relevant Features for Classification and Regression of Music Lyrics,” IEEE Transactions on Affective Computing, vol. 9, no. 2, pp. 240-254, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Matthew V. Mahoney, “Fast Text Compression with Neural Networks,” Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2000), pp. 1-5, 2000.
[Google Scholar] [Publisher Link]
[17] R. Guru Ravi Shanker, “Emotion Unmasked: A Transformer-Based Analysis of Lyrics for Improved Emotion Recognitionm,” Thesis, International Institute of Information Technology, pp. 1-51, 2023.
[Google Scholar] [Publisher Link]
[18] Yinan Zhou, Music Emotion Recognition on Lyrics Using Natural Language Processing, McGill University Libraries, 2022.
[Google Scholar] [Publisher Link]
[19] Jacob Devlin et al., “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, vol. 1, pp. 4171-4186, 2019.
[CrossRef] [Google Scholar] [Publisher Link]