A geometric interpretation of non-target-normalized maximum cross-channel correlation for vocal activity detection in meetings

Authors:
Kornel Laskowski;Tanja Schultz
Affiliations:
Universität Karlsruhe, Karlsruhe, Germany;Carnegie Mellon University, Pittsburgh, PA
Venue:
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Year:
2007

Citing 2
Cited 1

Speech activity detection on multichannels of meeting recordings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The ISL RT-06S speech-to-text system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Modeling vocal interaction for segmentation in meeting recognition

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Vocal activity detection is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, standard vocal activity detection algorithms have been shown to be ineffective, because participants typically vocalize for only a fraction of the recorded time and because, while they are not vocalizing, their channels are frequently dominated by crosstalk from other participants. In the present work, we review a particular type of normalization of maximum cross-channel correlation, a feature recently introduced to address the crosstalk problem. We derive a plausible geometric interpretation and show how the frame size affects performance.