A probabilistic feature based SVM model for Hindi/English speech recognition

  • Authors

    • Priyanka Bansal
    • Syed Akhtar Imam
    2018-03-19
    https://doi.org/10.14419/ijet.v7i2.8.10423
  • Speech Recognition, Fuzzy-Weighted, SVM, HMM, Featured.
  • Real time speech recognition has various challenges including noise, turbulence, language and crosstalk problem. In this paper, multi-phase hybridization is applied to cover these challenges and to provide effective speech recognition. The model is explicitly divided into three main stages where each stage is implicitly divided into several sub-stages to provide specific problem solution. The proposed hybrid model resolved the problem of acoustic turbulence, background noise and instrumentation noise problem at the earlier stage. The rectified speech signals are processed using ICA and Fuzzy-HMM approach to generate the structural and statistical features. In this stage, the signal is divided in smaller linear blocks to extract the features. Later on, fuzzy-weighted SVM is implied to recognize the speech signal. The experimentation is applied on Hindi and English characters and sentence datasets. The comparative results are derived against BPNN and PCA models for different sample sets. The comparative results obtained from model signifies that the model has improved the recognition rate effectively.

  • References

    1. [1] D. Ververidis and C. Kotropoulos (2004), â€Automatic speech classification to five emotional states based on gender informationâ€, Signal Processing Conference, 2004 12th European, Vol, Issue , pp 341-344, 2004

      [2] S. Z. Guo; L. J. Yu; G. Y. Kang (2010), â€Band Energy Based GMM Speech with Noise Classification Algorithmâ€, Pervasive Computing Signal Processing and Applications (PCSPA), 2010 First International Conference on, Vol, Issue , pp 541-544, 2010

      [3] C. Shao; M. Bouchard (2003), â€Efficient classification of noisy speech using neural networksâ€, Signal Processing and Its Applications, 2003. Proceedings. Seventh International Symposium on, Vol1, Issue , pp 357-360 vol.1, 2003

      [4] S. Bansal; A. Dev (2015), â€Emotional Hindi speech: Feature extraction and classificationâ€, Computing for Sustainable Global Development (INDIACom), 2015 2nd International Conference on, Vol, Issue , pp 1865-1868, 2015

      [5] Y. Zeng; Y. Zhang (2007), â€Robust Children and Adults Speech Classificationâ€, Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on, Vol4, Issue , pp 721-725, 2007

      [6] M. F. Kaleem; B. Ghoraani; A. Guergachi; S. Krishnan (2011), â€Telephone-quality pathological speech classification using empirical mode decompositionâ€, Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE, Vol, Issue , pp 7095-7098, 2011

      [7] M. Murugappan; N. Q. I. Baharuddin; S. Jerritta (2012), â€DWT and MFCC based human emotional speech classification using LDAâ€, Biomedical Engineering (ICoBE), 2012 International Conference on, Vol, Issue , pp 203-206, 2012

      [8] R. Tadeusiewicz; A. Izworski; W. Wszolek; T. Wszolek (1999), â€Processing and classification of deformed speech using neural networksâ€, [Engineering in Medicine and Biology, 1999. 21st Annual Conference and the 1999 Annual Fall Meetring of the Biomedical Engineering Society] BMES/EMBS Conference, 1999. Proceedings of the First Joint, Vol2, Issue , pp 927 vol.2-, 1999

      [9] T. Ghiselli-Crippa; A. El-Jaroudi (1991), â€A fast neural net training algorithm and its application to voiced-unvoiced-silence classification of speechâ€, Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on, Vol, Issue , pp 441-444 vol.1, 1991

      [10] D. Ververidis; C. Kotropoulos (2010), â€Emotional speech classification using Gaussian mixture modelsâ€, Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on, Vol, Issue , pp 2871-2874 Vol. 3, 2005

      [11] R. Cai (2010), â€A Modified Multi-Feature Voiced/Unvoiced Speech Classification Methodâ€, Power Electronics and Design (APED), 2010 Asia-Pacific Conference on, Vol, Issue , pp 68-71, 2010

      [12] M. Srinivas; D. Roy; C. K. Mohan (2014), â€Learning sparse dictionaries for music and speech classificationâ€, Digital Signal Processing (DSP), 2014 19th International Conference on, Vol, Issue , pp 673-675, 2014

      [13] M. Charfuelan; G. J. Kruijff (2013), â€Classification of speech under stress and cognitive load in USAR operationsâ€, Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, Vol, Issue , pp 508-512, 2013

      [14] Zhongzhe Xiao; E. Dellandrea; Weibei Dou; Liming Chen (2005), â€Features extraction and selection for emotional speech classificationâ€, Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on, Vol, Issue , pp 411-416, 2005

      [15] Guojun Zhou; J. H. L. Hansen; J. F. Kaiser(1998), â€Classification of speech under stress based on features derived from the nonlinear Teager energy operatorâ€, Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, Vol1, Issue , pp 549-552 vol.1, 1998

      [16] G. Zhou; J. H. L. Hansen; J. F. Kaiser (2001), â€Nonlinear feature based classification of speech under stressâ€, IEEE Transactions on Speech and Audio Processing, Vol9, Issue 3, pp 201-216, 2001

      [17] T. Ghiselli-Crippa; A. El-Jaroudi (1991), â€Voiced-unvoiced-silence classification of speech using neural netsâ€, Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on, Volii, Issue , pp 851-856 vol.2, 1991

      [18] Mahdhaoui; M. Chetouani (2010), â€Emotional Speech Classification Based on Multi View Characterizationâ€, Pattern Recognition (ICPR), 2010 20th International Conference on, Vol, Issue , pp 4488-4491, 2010

      [19] S. Ramamohan; S. Dandapat (2006), â€Sinusoidal model-based analysis and classification of stressed speechâ€, IEEE Transactions on Audio, Speech, and Language Processing, Vol14, Issue 3, pp 737-746, 2006

      [20] L. He; M. Lech; N. C. Maddage; N. B. Allen (2009), â€Time-frequency feature extraction from spectrograms and wavelet packets with application to automatic stress and emotion classification in speechâ€, Information, Communications and Signal Processing, 2009. ICICS 2009. 7th International Conference on, Vol, Issue , pp 1-5, 2009.

      [21] [7] Priyanka Bansal and Syed Akhtar Imam, “Automated Speaker Recognition Methods: A Critical Review†, International Journal of Advance Research in Science and Engineering, vol. 6 ,issue 10, pp. 519 - 524, Oct 2017.

      [22] Priyanka Bansal and Syed Akhtar Imam, "Speaker recognition using MFCC, shifted MFCC with vector quantization and fuzzy", International Conference on Soft Computing Techniques and Implementations, October 2015.

      [23] S.V.Manikanthan and T.Padmapriya “Recent Trends In M2m Communications In 4g Networks And Evolution Towards 5gâ€, International Journal of Pure and Applied Mathematics, ISSN NO:1314-3395, Vol-115, Issue -8, Sep 2017.

      [24] T. Padmapriya and V. Saminadan, “Inter-cell Load Balancing technique for multi-class traffic in MIMO-LTE-A Networksâ€, International Journal of Electrical, Electronics and Data Communication (IJEEDC), ISSN: 2320- 2084, vol.3, no.8, pp. 22-26, Aug 2015.

      [25] S Nazeer Hussain, K Hari Kishore "Computational Optimization of Placement and Routing using Genetic Algorithm†Indian Journal of Science and Technology, ISSN No: 0974-6846, Vol No.9, Issue No.47, page: 1-4, December 2016.

  • Downloads

  • How to Cite

    Bansal, P., & Akhtar Imam, S. (2018). A probabilistic feature based SVM model for Hindi/English speech recognition. International Journal of Engineering & Technology, 7(2.8), 271-277. https://doi.org/10.14419/ijet.v7i2.8.10423