Modeling vocal interaction for segmentation in meeting recognition

  • Authors:
  • Kornel Laskowski;Tanja Schultz

  • Affiliations:
  • interACT, Carnegie Mellon University, Pittsburgh, PA;interACT, Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic segmentation is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, but standard vocal activity detection algorithms for closetalk microphones in meetings continue to treat participants independently. In this work we present a multispeaker segmentation system which models a particular aspect of human-human communication, that of vocal interaction or the interdependence between participants' on-off speech patterns. We describe our vocal interaction model, its training, and its use during vocal activity decoding. Our experiments show that this approach almost completely eliminates the problem of crosstalk, and word error rates on our development set are lower than those obtained with human-generatated reference segmentation. We also observe significant performance improvements on unseen data.