Speech and Speaker Recognition Technology using MFCC and SVM

Anamika Baradiya and Vinay Jain

Citation :

Anamika Baradiya and Vinay Jain, "Speech and Speaker Recognition Technology using MFCC and SVM," International Journal of Electronics and Communication Engineering, vol. 2, no. 5, pp. 6-9, 2015. Crossref, https://doi.org/10.14445/23488549/IJECE-V2I5P105

Abstract

Speaker recognition is an active field of research with important forensic and security application .The investigation in the field of speaker recognition is in progress almost five decades and also there are several challenges and day to day new opportunities in this field. In observation of the fact that speech is the most natural form of communication for the human being it is also uses to express the sense and identity. A speaker is known through their tone which contained the information of speech signal. Speaker identification is one of the biometric identification technologies and now days it is use in different areas. The principle of Speaker recognition is to recognize the human being through their voice or speech signal. Speaker recognition is categorized into two categories such as speaker identification and speaker verification. The wider range of speaker recognition is in voice dialling, telephone shopping, telephone banking, database access services, voice mail and many others. Speaker features of the input speech from test subject will be extracted and matched against the speaker model. A probability will evaluate the similarity between the model and the measured observations. The common approach is based on a threshold set for the acoustic likelihood ratio to decide the test speaker is accepted or not. Conventional speaker verification systems use hidden Markov models (HMM) or Gaussian mixture model (GMM) to perform the likelihood ratio test [1-6]. These systems use a generative model for all speaker models. This will result in over-fitting and maybe cannot maximize the discrimination of speaker and impostors

Keywords

Speaker verification, Speaker recognition, MFCC, SVM.

References

[1] Peter Day and Asoke K. Nandi, “Robust Text-Independent Speaker Verification Using Genetic Programming,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 15, No. 1, pp. 285-295, 2007.
[2] Minho Jin, Frank K. Soong, and Chang D. Yoo, “A Syllable Lattice Approach to Speaker Verification,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 15, No. 8, pp. 2476-2484, 2007.
[3] A.E. Rosenberg, “Automatic speaker verification: A review,” IEEE Proceedings, Vol. 64, pp. 475-487, 1976.
[4] Guiwen Ou and Dengfeng Ke, “Text-independent speaker verification based on relation of MFCC components,” 2004 International Symposium on Chinese Spoken Language Processing, pp. 57-60, Dec. 2004.
[5] A. Mezghani and D. O'Shaughnessy, “Speaker verification using a new representation based on a combination of MFCC and formants,” 2005 Canadian Conference on Electrical and Computer Engineering, pp. 1461-1464, May 2005.
[6] M.M Homayounpour and I. Rezaian, “Robust Speaker Verification Based on Multi Stage Vector Quantization of MFCC Parameters on Narrow Bandwidth Channels,” ICACT 2008, vol 1, pp.336-340, Feb. 2008
[7] C.C. Lin, S.H. Chen, T. K. Truong, and Yukon Chang, “Audio Classification and Categorization Based on Wavelets and Support Vector Machine,” IEEE Trans. on Speech and Audio Processing, Vol. 13, No. 5, pp. 644-651, Sept. 2005.
[8] Pawlewski, M, and J Jones. "URU Plus – a scalable component-based speaker-verification system for BT’s 21st century network." BT Technology Journal. Vol 23 .No 4 (October 2005): 45-53. Print.
[9] Pawlewski, Mark , and James Jones. "Biometric Technology Today." BT Security Research Centre. June 2006: 9-11. Print.
[10] Furui, Sadaoki. "50 years of progress in speech and speaker recognition." Department of Computer Science Tokyo Institute of Technology. 1-9. Print.
[11] Committee on technology, . "Speaker recognition." national science and technology council. 07 08 2006: 1-9. Print.
[12] Mathur S, Choudhary SK, Vyas JM (2013) Speaker Recognition System and its Forensic Implications. 2: 723 doi: 10.4172/scientificreports.723
[13] Büyük, Osman. "Telephone - based Text - dependent Speaker Verification." Trans. ArrayBoğaziçi University, 2011. 1-134. Print.
[14] Yegnanarayana, B. , S. R. Mahadeva Prasanna, Jinu Mariam Zachariah, and Cheedella S. GuptaPrasanna. "Combining Evidence From Source, Suprasegmental and Spectral Features for a Fixed-Text Speaker Verification System." IEEE Transactions on Speech and Audio Processing. Vol. 13.No. 4, (JULY 2005): 575-582. Print.
[15] L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
[16] http://www.elda.org/article52.html. (Aurora Database 2.0)