A Natural Conversational Virtual Human with Multimodal Dialog System

Authors

  • Itimad Raheem Ali Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor Malaysia
  • Ghazali Sulong Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor Malaysia
  • Ahmad Hoirul Basori Interactive Media and Human Interface Lab, Department of Informatics, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember Surabaya, Indonesia

DOI:

https://doi.org/10.11113/jt.v71.3859

Keywords:

Speech Synchronization, Dialog Behavior Systems.

Abstract

The making of virtual human character to be realistic and credible in real time automated dialog animation system is necessary. This kind of animation carries importance elements for many applications such as games, virtual agents and movie animations. It is also considered important for applications which require interaction between human and computer. However, for this purpose, it is compulsory that the machine should have sufficient intelligence for recognizing and synthesizing human voices. As one of the most vital interaction method between human and machine, speech has recently received significant attention, especially in avatar research innovation. One of the challenges is to create precise lip movements of the avatar and synchronize it with a recorded audio. This paper specifically introduces the innovative concept of multimodal dialog systems of the virtual character and focuses the output part of such systems. More specifically, its focus is on behavior planning and developing the data control languages (DCL).

References

G. Zoric, R. Forchheimer, and I.S. Pandzic. 2010. On creating multimodal virtual humans-real time speech driven facial gesturing, Multimedia Tools and Applications. 54(1): 165–179.

W. Wahlster. 2006. Dialogue Systems Go Multimodal The SmartKom Experience. Springer Berlin Heidelberg. 3–27.

G. Ferré. 2010. Timing Relationships between Speech and Co-Verbal Gestures in Spontaneous French.

A. Cerekovic and I. S. Pandžic. 2011. Multimodal behavior realization for embodied conversational agent, Multimedia Tools and Applications. 54(1): 143–164.

S. Kopp, B. Krenn, S. Marsella, A. Marshall, C. Pelachaud, H. Pirker, K. Thórisson, and H. Vilhjálmsson. 2006. Towards a common framework for multimodal generation: The behavior markup language. in Intelligent Virtual Agents. 4133: 205–217.

P. Aggarwal and D. Traum. 2011. The BML Sequencer: A Tool for Authoring Multi-character Animations. 428–430.

S. Sutton, R. Cole, J. De Villiers, J. Schalkwyk, P. Vermeulen, M. Macon, Y. Yan, E. Kaiser, B. Rundle, K. Shobaki, P. Hosom, A. Kain, J. Wouters, D. Massaro, and M. Cohen. 1998. Universal Speech Tools: The Cslu Toolkit.

G. Skantze and S. Al Moubayed. 2012. IrisTK: a Statechart-based Toolkit for Multi-party Face-to-face Interaction. 69–76.

F. López-Colino and J. Colás. 2012. Spanish Sign Language synthesis system. Journal of Visual Languages & Computing. 23(3). 121–136.

Y. Jung, A. Kuijper, D. Fellner, M. Kipp, J. Miksatko, J. Gratch, and D. Thalmann. 2011. Believable Virtual Characters in Human-Computer Dialogs.

B. Krenn, C. Pelachaud, H. Pirker, and C. Peters. 2011. Emotion-Oriented Systems. 389–415.

S. Scherer. 2013. Towards a Multimodal Virtual Audience Platform for Public Speaking Training. International Conference on Intelligent Virtual Agents.

J. Cassell and E. Mbodied. 2000. Human Conversation as a System Framework: Designing Embodied Conversational Agents.

B. Li, Q. Zhang, D. Zhou, and X. Wei. 2013. Facial Animation Based on Feature Points. 11(3).

J. Cassell, H.H. Vilhjálmsson, and T. Bickmore. 2001. BEAT: the Behavior Expression Animation Toolkit. In Proceedings Of The 28th Annual Conference On Computer Graphics And Interactive Techniques. 137: 477–486.

S. Kopp, B. Krenn, S. Marsella, and A. N. Marshall. 2011. Towards a Common Framework for Multimodal Generation: The Behavior Markup Language.

L.Q. Anh and C. Pelachaud. 2011. Expressive Gesture Model for Humanoid Robot. 224–231.

E. Bevacqua, T. Paristech, C. T. Paristech, J. Looser, and C. Pelachaud. 2011. Cross-Media Agent Platform. 1(212): 11–20.

M. Salvati and K. Anjyo. 2011. Developing Tools for 2D / 3D Conversion of Japanese Animations. 4503.

L. Kunc and J. Kleindienst. 2007. ECAF: Authoring Language for Embodied Conversational Agents. Springer-Verlag Berlin Heidelberg. 4629: 206–213

Downloads

Published

2014-12-30

How to Cite

A Natural Conversational Virtual Human with Multimodal Dialog System. (2014). Jurnal Teknologi, 71(5). https://doi.org/10.11113/jt.v71.3859