Hard vs. Fuzzy Clustering for Speech Utterance Categorization

Authors:
Amparo Albalate;David Suendermann
Affiliations:
Institute of Information Technology, University of Ulm,;SpeechCycle Inc., NY, USA
Venue:
PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Year:
2008

Citing 7
Cited 0

How may I help you?

Speech Communication - Special issue on interactive voice technology for telecommunication applications (IVITA '96)
A vector space model for automatic indexing

Communications of the ACM
Implementation of the SMART Information Retrieval System

Implementation of the SMART Information Retrieval System
Applied morphological processing of English

Natural Language Engineering
Finding content-bearing terms using term similarities

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Technical support dialog systems: issues, problems, and solutions

NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

To detect and describe categories in a given set of utterances without supervision, one may apply clustering to a space therein representing the utterances as vectors. This paper compares hard and fuzzy word clustering approaches applied to `almost' unsupervised utterance categorization for a technical support dialog system. Here, `almost' means that only one sample utterance is given per category to allow for objectively evaluating the performance of the clustering techniques. For this purpose, categorization accuracy of the respective techniques are measured against a manually annotated test corpus of more than 3000 utterances.