NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Overlap in meetings: ASR effects and analysis by dialog factors, speakers, and collection site
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The ISL RT-06S speech-to-text system
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A Hybrid Generative-Discriminative Approach to Speaker Diarization
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Modeling norms of turn-taking in multi-party conversation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Hi-index | 0.00 |
Automatic segmentation is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, but standard vocal activity detection algorithms for closetalk microphones in meetings continue to treat participants independently. In this work we present a multispeaker segmentation system which models a particular aspect of human-human communication, that of vocal interaction or the interdependence between participants' on-off speech patterns. We describe our vocal interaction model, its training, and its use during vocal activity decoding. Our experiments show that this approach almost completely eliminates the problem of crosstalk, and word error rates on our development set are lower than those obtained with human-generatated reference segmentation. We also observe significant performance improvements on unseen data.