Prosodic and temporal features for language modeling for dialog

Authors:
Nigel G. Ward;Alejandro Vega;Timo Baumann
Affiliations:
Computer Science, University of Texas at El Paso, 500 West University Avenue, El Paso, TX 79968, USA;Computer Science, University of Texas at El Paso, 500 West University Avenue, El Paso, TX 79968, USA;University of Potsdam, Linguistics Department Karl-Liebknecht-Straíe 24, 14476 Potsdam, Germany
Venue:
Speech Communication
Year:
2012

Citing 13
Cited 1

Statistical methods for speech recognition

Statistical methods for speech recognition
Bi-modal sentence structure for language modeling

Speech Communication
Speaking in time

Speech Communication - Dialogue and prosody
A neural probabilistic language model

The Journal of Machine Learning Research
Automatic discrimination between laughter and speech

Speech Communication
Multi-speaker language modeling

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
A finite-state turn-taking model for spoken dialog systems

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The independence of dimensions in multidimensional dialogue act annotation

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Sphinx-4: a flexible open source framework for speech recognition

Sphinx-4: a flexible open source framework for speech recognition
Using prosody to improve automatic speech recognition

Speech Communication
On the use of nonverbal speech sounds in human communication

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
SWITCHBOARD: telephone speech corpus for research and development

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Virtual rapport

IVA'06 Proceedings of the 6th international conference on Intelligent Virtual Agents

A bottom-up exploration of the dimensions of dialog state in spoken interaction

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word probabilities do vary with such features. Using perplexity as the metric, the most informative of these included recent speaking rate, volume, and pitch, and time until end of utterance. Using simple combinations of such features to augment trigram language models gave up to a 8.4% perplexity benefit on the Switchboard corpus, and up to a 1.0% relative reduction in word error rate (0.3% absolute) on the Verbmobil II corpus.