Implementing Concatenative Text-To-Speech Synthesis System for Marathi Language using Python

International Journal of Electrical and Electronics Engineering
© 2022 by SSRG - IJEEE Journal
Volume 9 Issue 9
Year of Publication : 2022
Authors : Vinayak K. Bairagi, Sarang L. Joshi, Vastav Bharambe
pdf
How to Cite?

Vinayak K. Bairagi, Sarang L. Joshi, Vastav Bharambe, "Implementing Concatenative Text-To-Speech Synthesis System for Marathi Language using Python," SSRG International Journal of Electrical and Electronics Engineering, vol. 9,  no. 9, pp. 1-11, 2022. Crossref, https://doi.org/10.14445/23488379/IJEEE-V9I9P101

Abstract:

A Text To Speech (TTS) synthesiser is a computer-based system which converts arbitrary input text into speech. A TTS system is helpful not only for speech or visually impaired people but also for educationally backward and underprivileged. Many TTS systems exist for English but still many people worldwide are not literate and comfortable in speaking, writing and reading English. A local language interface needs to be developed for such people. Considering this need, we have attempted to develop a TTS system for the Marathi language using python. Marathi is the fourth largest spoken language in India and an official language of the Indian state of Maharashtra and Goa. Marathi is known to and spoken by over 100 million people not only from India but also from Mauritius and Israel. Developing a Marathi TTS system will be useful for people in Maharashtra and several migrants coming to the state in search of jobs, business or education.

Keywords:

TTS, Speech Synthesis, Natural Language Processing, Text processing.

References:

[1] Repe Madhavi R., S. D. Shirbahadurkar, and Smita Desai, "Prosody Model for Marathi Language TTS Synthesis With Unit Search and Selection Speech Database,” In International Conference on Recent Trends In Information, Telecommunication and Computing (ITC). IEEE, pp.362-364, 2010.
[2] Barhate, Sanket, Shrutikshirsagar, Niramaysanghvi, Kaminisabu, Preetirao, and Nandini Bondale, “Prosodic Features of Marathi News Reading Style,” Region 10 Conference (TENCON), IEEE, pp.2215-2218, 2016.
[3] Kiruthiga, S., and K. Krishnamoorthy, “Design Issues In Developing Speech Corpus for Indian Languages - A Survey,” In International Conference on Computer Communication and Informatics (ICCCI), IEEE, pp.1-4, 2012.
[4] S. L. Joshi, V. K. Bairagi, “Recent Trends in Text to Speech Synthesis of Indian Languages,” International Journal of Helix, vol.9 , no.3, pp 4931- 4936, 2019.
[5] Kishore, S. P., Rohit Kumar, and Rajeev Sangal, “A Data-Driven Synthesis Approach for Indian Languages Using Syllable as Basic Unit,” In Proceedings of Intl. Conf. on NLP (ICON), pp.311-316, 2022.
[6] Panda, Soumya Priyadarsini, Ajit Kumar Nayak, and Srikanta Patnaik , “Text-to-Speech Synthesis With an Indian Language Perspective,” In International Journal of Grid and Utility Computing, vol.6, no.3-4, pp.170-178, 2015.
[7] Oloko-Oba Mustapha O, Ibiyemi T.S, Osagie Samuel E, “Text-to-Speech Synthesis Using Concatenative Approach,” In International Journal of Trend in Research and Development, vol.3, no.5, 2016 
[8] Sangramsing Kayte, Kavita Waghmare, Dr. Bharti Gawali, “Marathi Speech Synthesis: A Review,” In International Journal on Recent and Innovation Trends in Computing and Communication, vol.3, no.6, 2015.
[9] Sangramsing Kayte, Monica Mundada, Dr. Charansing Kayte, “Di-Phone-Based Concatenative Speech Synthesis Systems for Marathi Language,” In IOSR Journal of VLSI and Signal Processing, vol.5, no.5, 2015.
[10] Sangramsing Kayte, Monica Mundada, Dr. Charansing Kayte, “A Corpus-Based Concatenative Speech Synthesis System for Marathi,” In IOSR Journal of VLSI and Signal Processing, vol.5, no.5, 2015.
[11] http://Tcts.Fpms.Ac.Be/Synthesis/Mbrola.html
[12] http://Espeak.Sourceforge.Net/
[13] https://www.Cstr.Ed.Ac.Uk/Projects/Festival/
[14] Murthy, Hema A., Ashwin Bellur, Vinodh Viswanath, Badri Narayanan, Anila Susan, G. Kasthuri, K. Sreenivasa Rao, “Building Unit Selection Speech Synthesis in Indian Languages: an Initiative by an Indian Consortium,” In Proceedings of COCOSDA, pp 358-361, 2010.
[15] Pradhan, Abhijit, Anusha Prakash, S. Aswin Shanmugam, G. R. Kasthuri, Raghava Krishnan, and Hema A. Murthy, “Building Speech Synthesis Systems for Indian Languages,” In Twenty-First National Conference on Communications (NCC),IEEE, pp.1-6, 2015.
[16] Tabet, Youcef, and Mohamed Boughazi, “Speech Synthesis Techniques-A Survey,” In 7th International Workshop on Systems, Signal Processing and Their Applications (WOSSPA), IEEE, pp.67-70, 2011.
[17] https://Cdac.In/Index.Aspx?Id=Mc_St_Speech_Technology
[18] http://Tdil-Dc.In/Index.Php?Option=Com_Vertical&Parentid=85&Lang=En
[19] https://www.Fon.Hum.Uva.Nl/Praat/
[20] http://Ivr.Indiantts.Co.In/En/Home
[21] https://Play.Google.Com/Store/Apps/Details?Id=Com.Sinwho.Tts
[22] www.Nltk.Org
[23] K.Sureshkumar and Dr.P.Thatchinamoorthy, “Speech and Spectral Landscapes Using Mel-Frequency Cepstral Coefficients Signal Processing,” SSRG International Journal of VLSI & Signal Processing, vol.3, no.1, pp.5-8, 2016. Crossref, https://doi.org/10.14445/23942584/IJVSP-V3I1P102
[24] ZENG Runhua, ZHANG Shuqun, “Improving Speech Emotion Recognition Method of Convolutional Neural Network,” International Journal of Recent Engineering Science, vol.5, no.3, pp.1-7, 2018. Crossref, https://doi.org/10.14445/23497157/IJRES-V5I3P101.
[25] Petra Wagner, Jonas Beskow, Simon Betz , Jens Edlund , Joakim Gustafson , Gustav Eje Henter , Sébastien Le Maguer , Zofia Malisz , Éva Székely , Christina Tånnander , Jana Voße, “Speech Synthesis Evaluation — State-of-the-Art Assessment and Suggestion for A novel Research Program,” In Proceedings of the 10th Speech Synthesis Workshop (SSW10), 2019.
[26] Smita S. Hande, “A Review of Concatenative Text to Speech Synthesis,” In International Journal of Latest Technology in Engineering, Management & Applied Science, 2014.
[27] Abitha A and Lincy K, “A Faster RCNN Based Image Text Detection and Text to Speech Conversion,” SSRG International Journal of Electronics and Communication Engineering, vol.5, no.5, pp.11-14, 2018. Crossref, https://doi.org/10.14445/23488549/IJECE-V5I5P103.
[28] Anamika Baradiya and Vinay Jain, “Speech and Speaker Recognition Technology Using MFCC and SVM,” SSRG International Journal of Electronics and Communication Engineering, vol.2, no.5, pp.6-9, 2015. Crossref, https://doi.org/10.14445/23488549/IJECE-V2I5P105.
[29] Yin Zhigang, “An Overview of Speech Synthesis Technology,” In Eighth International Conference on Instrumentation and Measurement, Computer, Communication and Control, IEEE, 2018.