THE COMPARATIVE PERFORMANCE EVALUATION OF WINDOW FUNCTIONS UNDER NOISY ENVIRONMENT FOR SPEECH RECOGNITION
DOI:
https://doi.org/10.11113/jt.v78.8690Keywords:
Speech recognition, MFCC, BPNNs, windowing, accuracyAbstract
The accuracy and user acceptance of speech recognition systems is increasing in the last few years especially for automated identification and biomedical applications. In implementation, it works based on the feature of utterance that will be recognized through a feature extraction process. One process in feature extraction is windowing that is done for minimizing the disruptions at the first and last of the frame. Basically, many window functions exist such as rectangular window, flat top window, hamming window, etc, but in the real application only hamming or Hanning function that are usually used as a function in the windowing. This article will analyzed the performance of all of window functions to prove the performance of those function. The method that was used are mel-frequencies cepstral coefficients (MFCCs) as feature extractor technique and back propagation neural networks (BPNNs) as classifier. The result shows that it can produce an accuracy at least 99%. The optimal accuracy up to 99.86% is achieved using rectangle window with the duration of process is 15.47 msec. This results show the superior performance of rectangle window as reference to recognize an isolated word based on speech.
References
Stuckless, R. 1994. Developments in real-time speech-to-text communication for people with impaired hearing. In M. Ross, Communication access for people with hearing loss. 197-226. Baltimore, MD: York Press.
Rabiner, R. L., & Juang, B. H. 2004. Statistical Methods for the Recognition and Understanding of Speech. Rutgers University and the University of California, Santa Barbara; Georgia Institute of Technology, Atlanta.
Mohammed, A., Sunar, M. S., & Hj Salam, Md. S. 2015. Quranic Verses Verification using Speech Recognition Techniques. Jurnal Teknologi (Sciences & Engineering). 73(2): 99–106.
Sze, H. K., & Shaikh Salleh, S. H. 2004. Design of Educational Software for Automatic Speech Recognition (ASR) Techniques. Jurnal Teknologi (Sciences & Engineering). 40 (D):133–144.
Suyanto, & Putro, A. E. 2014. Automatic Segmentation of Indonesian Speech into Syllables using Fuzzy Smoothed Energy Contour with Local Normalization, Splitting, and Assimilation. Journal of ICT Research and Applications. 8(2): 97-112.
Hardy, & Cheah, Y.-N. 2013. Question Classification Using Extreme Learning Machine on Semantic Features. Journal of ICT Research and Applications. 7(1): 36-58.
Abu-Ain, T., Abdullah, S. N., Omar, K., Abu-Ein, A., Bataineh, B., & Abu-Ain, W. 2013. Text Normalization Method for Arabic Handwritten Script. Journal of ICT Research and Applications. 7(2): 164-175.
Kurniawan, F., Mohd. Rahim, M. S., Sholihah, N., Rakhmadi, A., & Mohamad, D. 2011. Characters Segmentation of Cursive Handwritten Words based on Contour Analysis and Neural Network Validation. Journal of ICT Research and Applications. 5(1): 1-16.
Khodra, M. L., Widyantoro, D. H., Aziz, E. A., & Trilaksono, B. R. 2011. Free Model of Sentence Classifier for Automatic Extraction of Topic Sentences. Journal of ICT Research and Applications. 5(1): 17-34.
Favero, R. F. 1994. Comparison Of Mother Wavelets For Speech Recognition. International Conference Speech Science and Technology. 336-341.
Rozman, R., & Kodek, D. M. 2003. Improving Speech Recognition Robustness Using Non-Standard Windows. European Science Fiction Convention. Ljubljana, Slovenia.
Rozman, R., & Kodek, D. M. 2007. Using Asymmetric Windows In Automatic Speech Recognition. Elsevier Speech Communication. 268–276.
Rajput, S. S., & Bhadauria, D. S. 2012. Comparison of Band-stop FIR Filter using Modified Hamming Window and Other Window functions and Its Application in Filtering a Mutitone Signal. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET). 1(8): 325-328.
Podder, P., Khan, T. Z., Khan, M. H., & Rahman, M. M. 2014. Comparative Performance Analysis of Hamming, Hanning and Blackman Window. International Journal of Computer Applications. 96(19): 1-7.
Verma, A. R., & Kumar, S. A. 2012. A Comparative Study of Performance of Different Window Functions for Speech Enhancement. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS). 993-1002. Jaipur, Rajasthan, India : Springer.
Tiwari, V. 2010. MFCC And Its Applications In Speaker Recognition. International Journal on Emerging Technologies. 19-22.
Furui, S. 2000. Digital Speech Processing: Synthesis, and Recognition (2nd ed.). CRC Press.
MotlÃÄek P. 2002. Feature Extraction in Speech Coding and Recognition, Report, Portland, to research, data, and theory. Belmont, CA: Thomson/Wadsworth, 2003 US, Oregon Graduate Institute of Science and Technology. 1-50
Downloads
Published
Issue
Section
License
Copyright of articles that appear in Jurnal Teknologi belongs exclusively to Penerbit Universiti Teknologi Malaysia (Penerbit UTM Press). This copyright covers the rights to reproduce the article, including reprints, electronic reproductions, or any other reproductions of similar nature.