SPOKEN-DIGIT CLASSIFICATION USING ARTIFICIAL NEURAL NETWORK
Keywords:artificial neural networks, signal processing, MFCC, speech recognition
AbstractAudio classification has been one of the most popular applications of Artificial Neural Networks. This process is at the center of modern AI technology, such as virtual assistants, automatic speech recognition, and text-to-speech applications. There have been studies about spoken digit classification and its applications. However, to the best of the author's knowledge, very few works focusing on English spoken digit recognition that implemented ANN classification have been done. In this study, the authors utilized the Mel-Frequency Cepstral Coefficients (MFCC) features of the audio recording and Artificial Neural Network (ANN) as the classifier to recognize the spoken digit by the speaker. The Audio MNIST dataset was used as training and test data while the Free-Spoken Digit Dataset was used as additional validation data. The model showed an F-1 score of 99.56% accuracy for the test data and an F1 score of 81.92% accuracy for the validation data.
R. Sadiq, M. J. Rodriguez and H. R. Mian, 2019 Encyclopedia of Environmental Health (Second Edition),.
TELUS International.2021. What is audio classification?," [Online]. Available: https://www.telusinternational.com/articles/what-is-audio-classification. Accessed: Aug-2021
M. M. M. Sukri, U. Fadlilah, S. Saon, A. K. Mahamad, M. M. Som and A. Sidek,2020 "Bird Sound Identification based on Artificial Neural Network," in 2020 IEEE Student Conference on Research and Development (SCOReD), Batu Pahat, Malaysia, 342-345.
M. Rahmandani, H. A. Nugroho and N. A. Setiawan, 2018."Cardiac Sound Classification Using Mel-Frequency Cepstral Coefficients (MFCC) and Artificial Neural Network (ANN)," in 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia,
H. Dolka, A. X. V. M and S. Juliet, 2021. "Speech Emotion Recognition Using ANN on MFCC Features," in 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India,
E. S. Wahyuni, 2017. "Arabic speech recognition using MFCC feature extraction and ANN classification," in 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia,
S. Sangjamraschaikun and P. Seresangtakul, 2017."Isarn digit speech recognition using HMM," in 2017 2nd International Conference on Information Technology (INCIT), Nakhonpathom, Thailand,
S. Nisar, I. Shahzad, M. A. Khan and M. Tariq, 2017. "Pashto spoken digits recognition using spectral and prosodic based feature extraction," in 2017 Ninth International Conference on Advanced Computational Intelligence (ICACI), Doha, Qatar
S. M. B. Wazir and J. H. Chuah, 2019."Spoken Arabic Digits Recognition Using Deep Learning," in 2019 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), Selangor, Malaysia.
R. V. Sharan, 2020. "Spoken Digit Recognition Using Wavelet Scalogram and Convolutional Neural Networks," in 2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS), Thiruvananthapuram, India.
S. Becker, M. Ackermann, S. Lapuschkin, K.-R. Müller and W. Samek, 2018"Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals," [Online]. Available: https://arxiv.org/abs/1807.03418.
Z. Jackson, 2018 "Free Spoken Digit Dataset (FSDD)," [Online]. Available: https://github.com/Jakobovski/free-spoken-digit-dataset. doi: 10.5281/ZENODO.1342401.
T. Bäckström, “Windowing,” Aalto University Wiki, Aug-2019. [Online]. Available: https://wiki.aalto.fi/display/ITSP/Windowing. [Accessed: Aug-2021].
J. Hrisko, 2018."Audio Processing in Python Part I: Sampling, Nyquist, and the Fast Fourier Transform," [Online]. Available: https://makersportal.com/blog/2018/9/13/audio-processing-in-python-part-i-sampling-and-the-fast-fourier-transform. Accessed: Aug-2021
D. Salomon, 2004. Data Compression: The Complete Reference, Springer Science & Business Media
R. G. d. Luna, R. G. Baldovino, E. A. Cotoco, A. L. P. d. Ocampo, I. C. Valenzuela, A. B. Culaba and E. P. D. Gokongwei, 2017. "Identification of Philippine Herbal Medicine Plant Leaf Using Artificial Neural Network," in 2017 IEEE 9th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), Manila, Philippines
G. E. Hinton, "2012 Coursera Course Lectures," [Online]. Available: https://www.cs.toronto.edu/~hinton/coursera/lecture1/lec1.pdf Accessed: Aug-2021
Goodfellow, Y. Bengio and A. Courville, 2016. Deep Learning, MIT Press,
T. D. Kainova and A. A. Zhilenkov, 2022 "Artificial Neural Networks in the Geometric Paradigm," Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), 2022, pp. 321-325, doi: 10.1109/ElConRus54750.2022.9755786.
R. K. C. Billones et al., 2015 "Speech-controlled human-computer interface for audio-visual breast self-examination guidance system," 2015 International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM), 1-6, doi: 10.1109/HNICEM.2015.7393236.