SPEECH EMOTION CLASSIFICATION USING SVM AND MLP ON PROSODIC AND VOICE QUALITY FEATURES
DOI:
https://doi.org/10.11113/jt.v78.6925Keywords:
Emotion Recognition, SMO, SVM, MLP Prosodic Features, Voice Quality FeaturesAbstract
In this paper, a comparison of emotion classification undertaken by the Support Vector Machine (SVM) and the Multi-Layer Perceptron (MLP) Neural Network, using prosodic and voice quality features extracted from the Berlin Emotional Database, is reported. The features were extracted using PRAAT tools, while the WEKA tool was used for classification. Different parameters were set up for both SVM and MLP, which are used to obtain an optimized emotion classification. The results show that MLP overcomes SVM in overall emotion classification performance. Nevertheless, the training for SVM was much faster when compared to MLP. The overall accuracy was 76.82% for SVM and 78.69% for MLP. Sadness was the emotion most recognized by MLP, with accuracy of 89.0%, while anger was the emotion most recognized by SVM, with accuracy of 87.4%. The most confusing emotions using MLP classification were happiness and fear, while for SVM, the most confusing emotions were disgust and fear.Â
References
Sathe-Pathak, B., and Panat, A. 2012. Extraction of Pitch and Formants and its Analysis to Identify 3 Different Emotional States Of A Person. IJCSI International Journal of Computer Science. 9.
Sidorova, J. 2007. DEA Report: Speech Emotion Recognition. Appendix.
Abelin, Ã…. and Allwood, J. 2000. Cross Linguistic Interpretation Of Emotional Prosody. In ISCA Tutorial and Research Workshop (ITRW) on Speech Emotion.
Marko Lugger and Bin Yang. 2008. Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features. INTECH Open Access Publisher.
Arias, J., Busso, C. and Yoma, N. 2014. Shape-based Modeling Of The Fundamental Frequency Contour For Emotion Detection In Speech. Computer Speech and Language. 28(1): 278-294.
Hendy, N., and Farag, H. 2013. Emotion Recognition Using Neural Network: A Comparative Study. World Academy Of Science. Engineering and Technology. 7: 1149-1155.
Vogt, T., André, E., and Wagner, J. 2008. Recognition Of Emotions From Speech: A Review Of The Literature And Recommendations For Practical Realization. Springer. 75-91.
Pao, T., Chen, Y., Yeh, J., and Li, P. 2006. Mandarin Emotional Speech Recognition based on SVM and NN. In Pattern Recognition, 2006. ICPR 2006. 18th International Conference on. IEEE. 1096-1100.
El Ayadi, M., Kamel, M., and Karray, F. 2011. Survey On Speech Emotion Recognition: Features, Classification Schemes, And Databases. Pattern Recognition. 44(3): 572-587.
Tickle, A., Raghu, S., and Elshaw, M. 2013. Emotional Recognition From The Speech Signal For A Virtual Education Agent. J. Phys. Conf. Ser. 450: 012053.
Ingale, A., and Chaudhari, D. 2012. Speech Emotion Recognition. International Journal of Soft Computing and Engineering (IJSCE). ISSN, 2231-2307.
Chavhan, Y., Dhore, M. L., and Yesaware, P. 2010. Speech Emotion Recognition Using Support Vector Machine. International Journal of Computer Applications. 1(20): 6-9.
Milton, A., Roy, S. S., and Selvi, S. 2013. Svm Scheme For Speech Emotion Recognition Using Mfcc Feature. International Journal of Computer Applications. 69(9): 34-39.
Schuller, B., Reiter, S., and Rigoll, G. 2006. Evolutionary Feature Generation In Speech Emotion Recognition. In Multimedia and Expo, 2006 IEEE International Conference on. IEEE. 5-8.
You, M., Chen, C., Bu, J., Liu, J., and Tao, J. 2006. Emotional Speech Analysis On Nonlinear Manifold. IEEE. 91-94.
Pan, Y., Shen, P., and Shen, L. 2005. Feature Extraction And Selection In Speech Emotion Recognition. In IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS 2005), Como, Italy.
McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., and Stroeve, S. 2000. Approaching Automatic Recognition Of Emotion From Voice: A Rough Benchmark. In ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion.
Lee, C., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., and Deng, Z. et al. 2004. Emotion Recognition Based On Phoneme Classes. In Interspeech. 205-211.
Morrison, D., Wang, R., and De Silva, L. C. 2007. Ensemble Methods For Spoken Emotion Recognition In Call-Centres. Speech Communication. 49(2): 98-112.
Javidi, M., and Roshan, E. 2013. Speech Emotion Recognition By Using Combinations Of C5. 0, Neural Network (NN), And Support Vector Machines (SVM) Classification Methods. J. Math. Comput. Sci. 6: 191.
Fersini, E., Messina, E., and Archetti, F. 2012. Emotional States In Judicial Courtrooms: An Experimental Investigation. Speech Communication. 54(1): 11-22.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., and Weiss, B. 2005. A Database Of German Emotional Speech. In Interspeech. 1517-1520.
Schuller, B., Batliner, A., Steidl, S., and Seppi, D. 2011. Recognising Realistic Emotions And Affect In Speech: State Of The Art And Lessons Learnt From The First Challenge. Speech Communication. 53(9): 1062-1087.
Salam, M. S., Mohamad, D., and Salleh, S. H. 2009. Improved Statistical Speech Segmentation Using Connectionist Approach. Journal of Computer Science. 5(4): 275-282
Downloads
Published
Issue
Section
License
Copyright of articles that appear in Jurnal Teknologi belongs exclusively to Penerbit Universiti Teknologi Malaysia (Penerbit UTM Press). This copyright covers the rights to reproduce the article, including reprints, electronic reproductions, or any other reproductions of similar nature.