Turn-taking cues in task-oriented dialogue

Authors:
Agustín Gravano;Julia Hirschberg
Affiliations:
Departamento de Computación, FCEyN, Universidad de Buenos Aires, Argentina and Laboratorio de Investigaciones Sensoriales, Hospital de Clínicas, Universidad de Buenos Aires, Argentina;Department of Computer Science, Columbia University, New York, NY, USA
Venue:
Computer Speech and Language
Year:
2011

Citing 17
Cited 10

The ATIS spoken language systems pilot corpus

HLT '90 Proceedings of the workshop on Speech and Natural Language
C4.5: programs for machine learning

C4.5: programs for machine learning
The nature of statistical learning theory

The nature of statistical learning theory
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

Machine Learning
Support-Vector Networks

Machine Learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Introduction to Bayesian Networks

Introduction to Bayesian Networks
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Dialogue act modeling for automatic tagging and recognition of conversational speech

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Partial parsing via finite-state cascades

Natural Language Engineering
An empirical model of acknowledgment for spoken-language systems

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A shallow model of backchannel continuers in spoken dialogue

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Head-Driven Statistical Models for Natural Language Parsing

Computational Linguistics
Melodic cues to turn-taking in English: evidence from perception

SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
Optimizing endpointing thresholds using dialogue features in a spoken dialogue system

SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
SWITCHBOARD: telephone speech corpus for research and development

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Entrainment in speech preceding backchannels

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Towards conversational agents that attend to and adapt to communicative user feedback

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Affirmative cue words in task-oriented dialogue

Computational Linguistics
Optimizing the turn-taking behavior of task-oriented spoken dialog systems

ACM Transactions on Speech and Language Processing (TSLP)
Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates

Computer Speech and Language
Acoustic-prosodic entrainment and social behavior

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A temporal simulator for developing turn-taking methods for spoken dialogue systems

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
A regression-based approach to modeling addressee backchannels

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Resources for turn competition in overlapping talk

Speech Communication
Guest Editorial: Gesture and speech in interaction: An overview

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system and user also play an important role in system usability. In particular, both systems and users have difficulty determining when the other is taking or relinquishing the turn. In this paper, we seek to identify turn-taking cues correlated with human-human turn exchanges which are automatically computable. We compare the presence of potential prosodic, acoustic, and lexico-syntactic turn-yielding cues in prosodic phrases preceding turn changes (smooth switches) vs. turn retentions (holds) vs. backchannels in the Columbia Games Corpus, a large corpus of task-oriented dialogues, to determine which features reliably distinguish between these three. We identify seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems. Testing Duncan's (1972) hypothesis that these turn-yielding cues are linearly correlated with the occurrence of turn-taking attempts, we further demonstrate that, the greater the number of turn-yielding cues that are present, the greater the likelihood that a turn change will occur. We also identify six cues that precede backchannels, which will also be useful for IVR backchannel generation and recognition; these cues correlate with backchannel occurrence in a quadratic manner. We find similar results for overlapping and for non-overlapping speech.