Hard vs. Fuzzy Clustering for Speech Utterance Categorization

  • Authors:
  • Amparo Albalate;David Suendermann

  • Affiliations:
  • Institute of Information Technology, University of Ulm,;SpeechCycle Inc., NY, USA

  • Venue:
  • PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

To detect and describe categories in a given set of utterances without supervision, one may apply clustering to a space therein representing the utterances as vectors. This paper compares hard and fuzzy word clustering approaches applied to `almost' unsupervised utterance categorization for a technical support dialog system. Here, `almost' means that only one sample utterance is given per category to allow for objectively evaluating the performance of the clustering techniques. For this purpose, categorization accuracy of the respective techniques are measured against a manually annotated test corpus of more than 3000 utterances.