Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech

  • Authors:
  • Sébastien Cuendet;Dilek Hakkani-Tür;Elizabeth Shriberg

  • Affiliations:
  • International Computer Science Institute, Berkeley, CA;International Computer Science Institute, Berkeley, CA;International Computer Science Institute, Berkeley, CA and Speech Technology and Research Laboratory, SRI International, Menlo Park, CA

  • Venue:
  • MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In conversational speech, irregularities in the speech such as overlaps and disruptions make it difficult to decide what is a sentence. Thus, despite very precise guidelines on how to label conversational speech with dialog acts (DA), labeling inconsistencies are likely to appear. In this work, we present various methods to detect labeling inconsistencies in the ICSI meeting corpus. We show that by automatically detecting and removing the inconsistent examples from the training data, we significantly improve the sentence segmentation accuracy. We then manually analyze 200 of noisy examples detected by the system and observe that only 13% of them are labeling inconsitencies, while the rest are errors done by the classifier. The errors naturally cluster into 5 main classes for each of which we give hints on how the system can be improved to avoid these mistakes.