Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech

Authors:
Sébastien Cuendet;Dilek Hakkani-Tür;Elizabeth Shriberg
Affiliations:
International Computer Science Institute, Berkeley, CA;International Computer Science Institute, Berkeley, CA;International Computer Science Institute, Berkeley, CA and Speech Technology and Research Laboratory, SRI International, Menlo Park, CA
Venue:
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Year:
2007

Citing 3
Cited 1

BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Anomaly Detection over Noisy Data using Learned Probability Distributions

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using Boosting to Detect Noisy Data

Revised Papers from the PRICAI 2000 Workshop Reader, Four Workshops held at PRICAI 2000 on Advances in Artificial Intelligence

A Hybrid Generative-Discriminative Approach to Speaker Diarization

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In conversational speech, irregularities in the speech such as overlaps and disruptions make it difficult to decide what is a sentence. Thus, despite very precise guidelines on how to label conversational speech with dialog acts (DA), labeling inconsistencies are likely to appear. In this work, we present various methods to detect labeling inconsistencies in the ICSI meeting corpus. We show that by automatically detecting and removing the inconsistent examples from the training data, we significantly improve the sentence segmentation accuracy. We then manually analyze 200 of noisy examples detected by the system and observe that only 13% of them are labeling inconsitencies, while the rest are errors done by the classifier. The errors naturally cluster into 5 main classes for each of which we give hints on how the system can be improved to avoid these mistakes.