Statistical methods for speech recognition
Statistical methods for speech recognition
Bi-modal sentence structure for language modeling
Speech Communication
Speech Communication - Dialogue and prosody
A neural probabilistic language model
The Journal of Machine Learning Research
Automatic discrimination between laughter and speech
Speech Communication
Multi-speaker language modeling
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
A finite-state turn-taking model for spoken dialog systems
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The independence of dimensions in multidimensional dialogue act annotation
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Sphinx-4: a flexible open source framework for speech recognition
Sphinx-4: a flexible open source framework for speech recognition
Using prosody to improve automatic speech recognition
Speech Communication
On the use of nonverbal speech sounds in human communication
COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
SWITCHBOARD: telephone speech corpus for research and development
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
IVA'06 Proceedings of the 6th international conference on Intelligent Virtual Agents
A bottom-up exploration of the dimensions of dialog state in spoken interaction
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Hi-index | 0.00 |
If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word probabilities do vary with such features. Using perplexity as the metric, the most informative of these included recent speaking rate, volume, and pitch, and time until end of utterance. Using simple combinations of such features to augment trigram language models gave up to a 8.4% perplexity benefit on the Switchboard corpus, and up to a 1.0% relative reduction in word error rate (0.3% absolute) on the Verbmobil II corpus.