Patterns of synchronization of non-verbal cues and speech in ECAs: towards a more "natural" conversational agent

Authors:
Nicla Rossini
Affiliations:
Dipartimento di Studi Umanistici, Università del Piemonte Orientale, Vercelli, Italy
Venue:
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Year:
2010

Citing 5
Cited 0

Performative faces

Speech Communication - Special issue on auditory-visual speech processing
Embodiment in conversational interfaces: Rea

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Expressions of Empathy in ECAs

IVA '08 Proceedings of the 8th international conference on Intelligent Virtual Agents
Implementing expressive gesture synthesis for embodied conversational agents

GW'05 Proceedings of the 6th international conference on Gesture in Human-Computer Interaction and Simulation
A Virtual Head Driven by Music Expressivity

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an analysis of the verbal and non-verbal cues of Conversational Agents, with a special focus on REA and GRETA, in order to allow further research aimed at correcting some traits of their performance still considered unnatural by their final users. Despite the striking performance of new generation ECA, some important features make these conversational agents unreliable to the users, who usually prefer interacting with a classical computer for information retrieval. The users' preference can be due to several factors, such as the quality of speech synthesis, or the inevitable unnaturalness of the graphics animating the avatar. Apart from the unavoidable traits that can render ECAs unnatural to the ultimate users, instances of poor synchronization between verbal and non-verbal behaviour may contribute to unfavourable results. An instance of synchronization patterns between non-verbal cues and speech is here analysed and re-applied to the basic architecture of an ECA in order to improve the ECA's verbal and non-verbal synchronization. A proposal for future inquiry aimed at creating alternative model for the ultimate Mp4 output is also proposed, for further development in this field.