• Krishna Mohan Kudiri Computer and Information Sciences Department, Universiti Teknologi PETRONAS, Malaysia




Relative Bin Frequency Coefficients (RBFC), Relative Sub-Image Based (RSB), Support Vector Machine (SVM)


Estimation of human emotions during a conversation is difficult using a computer. In this study, facial expressions and speech are used in order to estimate emotions (angry, sad, happy, boredom, disgust and surprise). A proposed hybrid system through facial expressions and speech is used to estimate emotions of a person when he is engaged in a conversational session. Relative Bin Frequency Coefficients and Relative Sub-Image-Based features are used for acoustic and visual modalities respectively. Support Vector Machine is used for classification. This study shows that the proposed feature extraction through acoustic and visual data is the most prominent aspect affecting the emotion detection system, along with the proposed fusion technique. Although some other aspects are considered to be affecting the system, the effect is relatively minor. It was observed that the performance of the bimodal system was lower than the unimodal system through deliberate facial expressions. In order to deal with the problem, a suitable database is used. The results indicate that the proposed system showed better performance, with respect to basic emotional classes than the rest.


Darwin, C., Ekman, P. and Prodger, P. 1998. The Expression of the Emotions in Man and Animals.

Picard, R. W. and Picard, R. 1997. Affective Computing. 252(1).

Battocchi, A., Pianesi, F. and Goren-Bar, D. 2005. A First Evaluation Study of A Database of Kinetic Facial Expressions (DAFEX), IEEE 7th International Conference on Multimodal Interfaces. 558-565.

Lien, J. J. J. 1998. Automatic Recognition of Facial Expressions Using Hidden Markov Models and Estimation of Expression Intensity.

Otsuka, T. and Ohya, J. 1998. Spotting Segments Displaying Facial Expression from Image Sequences Using HMM. IEEE International Conference on Automatic Face and Gesture Recognition. 442-447.

Wang, J., Yin, L., Wei, X. and Sun, Y. 2006. 3D Facial Expression Recognition Based on Primitive Surface Feature Distribution. IEEE Computer Society Conference on Computer Vision and Pattern. 1399-1406.

El Kaliouby, R. and Robinson, P. 2005. Real-time Inference of Complex Mental States from Facial Expressions and Head Gestures, Real-time Vision for Human-computer Interaction. 181-200.

Vukadinovic, D. and Pantic, M. 2005. Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boosted Classifiers. IEEE International Conference on Systems, Man and Cybernetics. 1692-1698.

Padgett, C. and Cottrell, G. W. 1997. Representing Face Images for Emotion Classification Advances in Neural Information Processing Systems. 894-900.

Tsai, H. H., Lai, Y. S. and Zhang, Y. C. 2010. Using SVM to Design Facial Expression Recognition for Shape and Texture Features. International Conference on Machine Learning and Cybernetics. 2697-2704.

Pantic, M. and Baartlett, M. S. 2007. Machine Analysis of Facial Expressions.

Liu, J., Chen, C., Bu, J., You, M. and Tao, J. 2007. Speech Emotion Recognition based on a Fusion of All-class and Pair wise-class Feature Selection, International Conference on Computational Science. 168-175.

Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Lee, S., Neumann, U. and Narayanan, S. 2004. Analysis of Emotion Recognition Using Facial Expressions, Speech and Multimodal Information. 6th International Conference on Multimodal Interfaces. 205-211.

Iohnstone, T. and Scherer, K. 2000. Vocal Communication of Emotion. Handbook of Emotion. 220-235.

Huang, X., Li, S. Z. and Wang, Y. 2004. Statistical Learning of Evaluation Function for ASM/AAM Image Alignment. Biometric Authentication. 45-56.

Massaro, D. W. and Egan, P. B. 1996. Perceiving Affect from the Voice and the Face. Psychonomic Bulletin and Review. 215-221.

Russell, J. A., Bachorowski, J. A. and Fernndez-Dols, J. M. 2003. Facial and Vocal Expressions of Emotion. Annual Review of Psychology. 329-349.

Fasel, B. and Luettin, J. 2003. Automatic Facial Expression Analysis: A Survey. Pattern Recognition. 259-275.

Paleari, M. and Lisetti, C. L. 2006. Toward Multimodal Fusion of Affective Cues. 1st ACM International Workshop on Human-centered Multimedia. 99-108.

Viola, P. and Jones, M. J. 2004. Robust Real-time Face Detection. International Journal of Computer Vision. 57(1): 137-154.

Plutchik, R. 1980. A General Psychoevolutionary Theory of Emotion. Theory of Emotions.

Mohan Kudiri, K., Md. Said, A. and Nayan, M. Y. 2012. Emotion Detection Using Sub-image Based Features Through Human Facial Expressions. IEEE International Conference on Computer and Information Science. 332-335.

Mohan Kudiri, K., Md. Said, A. and Nayan, M. Y. 2012. Emotion Detection Using Relative Amplitude-based Features. IEEE International Conference on Computer and Information Science. 522-525.

Kun Han, Dong Yu, and Ivan Tashev. 2014. Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine. Annual Conference of International Speech Communication Association. 223-227.

Suja, P., Shikha Tripathi and Deepthy, J. 2014. Emotion Recognition from Facial Expressions Using Frequency Domain Techniques. Advances in Intelligent Systems and Computing. 299-310.




How to Cite

HYBRID FUSION OF FACE AND SPEECH INFORMATION FOR BIMODAL EMOTION ESTIMATION. (2016). Jurnal Teknologi, 78(8-2). https://doi.org/10.11113/jt.v78.9538