A probabilistic feature based SVM model for Hindi/English speech recognition

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Real time speech recognition has various challenges including noise, turbulence, language and crosstalk problem. In this paper, multi-phase hybridization is applied to cover these challenges and to provide effective speech recognition. The model is explicitly divided into three main stages where each stage is implicitly divided into several sub-stages to provide specific problem solution. The proposed hybrid model resolved the problem of acoustic turbulence, background noise and instrumentation noise problem at the earlier stage. The rectified speech signals are processed using ICA and Fuzzy-HMM approach to generate the structural and statistical features. In this stage, the signal is divided in smaller linear blocks to extract the features. Later on, fuzzy-weighted SVM is implied to recognize the speech signal. The experimentation is applied on Hindi and English characters and sentence datasets. The comparative results are derived against BPNN and PCA models for different sample sets. The comparative results obtained from model signifies that the model has improved the recognition rate effectively.

  • Keywords

    Speech Recognition, Fuzzy-Weighted, SVM, HMM, Featured.

  • References

      [1] D. Ververidis and C. Kotropoulos (2004), ”Automatic speech classification to five emotional states based on gender information”, Signal Processing Conference, 2004 12th European, Vol, Issue , pp 341-344, 2004

      [2] S. Z. Guo; L. J. Yu; G. Y. Kang (2010), ”Band Energy Based GMM Speech with Noise Classification Algorithm”, Pervasive Computing Signal Processing and Applications (PCSPA), 2010 First International Conference on, Vol, Issue , pp 541-544, 2010

      [3] C. Shao; M. Bouchard (2003), ”Efficient classification of noisy speech using neural networks”, Signal Processing and Its Applications, 2003. Proceedings. Seventh International Symposium on, Vol1, Issue , pp 357-360 vol.1, 2003

      [4] S. Bansal; A. Dev (2015), ”Emotional Hindi speech: Feature extraction and classification”, Computing for Sustainable Global Development (INDIACom), 2015 2nd International Conference on, Vol, Issue , pp 1865-1868, 2015

      [5] Y. Zeng; Y. Zhang (2007), ”Robust Children and Adults Speech Classification”, Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on, Vol4, Issue , pp 721-725, 2007

      [6] M. F. Kaleem; B. Ghoraani; A. Guergachi; S. Krishnan (2011), ”Telephone-quality pathological speech classification using empirical mode decomposition”, Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE, Vol, Issue , pp 7095-7098, 2011

      [7] M. Murugappan; N. Q. I. Baharuddin; S. Jerritta (2012), ”DWT and MFCC based human emotional speech classification using LDA”, Biomedical Engineering (ICoBE), 2012 International Conference on, Vol, Issue , pp 203-206, 2012

      [8] R. Tadeusiewicz; A. Izworski; W. Wszolek; T. Wszolek (1999), ”Processing and classification of deformed speech using neural networks”, [Engineering in Medicine and Biology, 1999. 21st Annual Conference and the 1999 Annual Fall Meetring of the Biomedical Engineering Society] BMES/EMBS Conference, 1999. Proceedings of the First Joint, Vol2, Issue , pp 927 vol.2-, 1999

      [9] T. Ghiselli-Crippa; A. El-Jaroudi (1991), ”A fast neural net training algorithm and its application to voiced-unvoiced-silence classification of speech”, Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on, Vol, Issue , pp 441-444 vol.1, 1991

      [10] D. Ververidis; C. Kotropoulos (2010), ”Emotional speech classification using Gaussian mixture models”, Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on, Vol, Issue , pp 2871-2874 Vol. 3, 2005

      [11] R. Cai (2010), ”A Modified Multi-Feature Voiced/Unvoiced Speech Classification Method”, Power Electronics and Design (APED), 2010 Asia-Pacific Conference on, Vol, Issue , pp 68-71, 2010

      [12] M. Srinivas; D. Roy; C. K. Mohan (2014), ”Learning sparse dictionaries for music and speech classification”, Digital Signal Processing (DSP), 2014 19th International Conference on, Vol, Issue , pp 673-675, 2014

      [13] M. Charfuelan; G. J. Kruijff (2013), ”Classification of speech under stress and cognitive load in USAR operations”, Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, Vol, Issue , pp 508-512, 2013

      [14] Zhongzhe Xiao; E. Dellandrea; Weibei Dou; Liming Chen (2005), ”Features extraction and selection for emotional speech classification”, Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on, Vol, Issue , pp 411-416, 2005

      [15] Guojun Zhou; J. H. L. Hansen; J. F. Kaiser(1998), ”Classification of speech under stress based on features derived from the nonlinear Teager energy operator”, Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, Vol1, Issue , pp 549-552 vol.1, 1998

      [16] G. Zhou; J. H. L. Hansen; J. F. Kaiser (2001), ”Nonlinear feature based classification of speech under stress”, IEEE Transactions on Speech and Audio Processing, Vol9, Issue 3, pp 201-216, 2001

      [17] T. Ghiselli-Crippa; A. El-Jaroudi (1991), ”Voiced-unvoiced-silence classification of speech using neural nets”, Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on, Volii, Issue , pp 851-856 vol.2, 1991

      [18] Mahdhaoui; M. Chetouani (2010), ”Emotional Speech Classification Based on Multi View Characterization”, Pattern Recognition (ICPR), 2010 20th International Conference on, Vol, Issue , pp 4488-4491, 2010

      [19] S. Ramamohan; S. Dandapat (2006), ”Sinusoidal model-based analysis and classification of stressed speech”, IEEE Transactions on Audio, Speech, and Language Processing, Vol14, Issue 3, pp 737-746, 2006

      [20] L. He; M. Lech; N. C. Maddage; N. B. Allen (2009), ”Time-frequency feature extraction from spectrograms and wavelet packets with application to automatic stress and emotion classification in speech”, Information, Communications and Signal Processing, 2009. ICICS 2009. 7th International Conference on, Vol, Issue , pp 1-5, 2009.

      [21] [7] Priyanka Bansal and Syed Akhtar Imam, “Automated Speaker Recognition Methods: A Critical Review” , International Journal of Advance Research in Science and Engineering, vol. 6 ,issue 10, pp. 519 - 524, Oct 2017.

      [22] Priyanka Bansal and Syed Akhtar Imam, "Speaker recognition using MFCC, shifted MFCC with vector quantization and fuzzy", International Conference on Soft Computing Techniques and Implementations, October 2015.

      [23] S.V.Manikanthan and T.Padmapriya “Recent Trends In M2m Communications In 4g Networks And Evolution Towards 5g”, International Journal of Pure and Applied Mathematics, ISSN NO:1314-3395, Vol-115, Issue -8, Sep 2017.

      [24] T. Padmapriya and V. Saminadan, “Inter-cell Load Balancing technique for multi-class traffic in MIMO-LTE-A Networks”, International Journal of Electrical, Electronics and Data Communication (IJEEDC), ISSN: 2320- 2084, vol.3, no.8, pp. 22-26, Aug 2015.

      [25] S Nazeer Hussain, K Hari Kishore "Computational Optimization of Placement and Routing using Genetic Algorithm” Indian Journal of Science and Technology, ISSN No: 0974-6846, Vol No.9, Issue No.47, page: 1-4, December 2016.




Article ID: 10423
DOI: 10.14419/ijet.v7i2.8.10423

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.