A phonotactic-semantic paradigm for automatic spoken document classification

Authors:
Bin Ma;Haizhou Li
Affiliations:
Institute for Infocomm Research, Keng Terrace, Singapore;Institute for Infocomm Research, Keng Terrace, Singapore
Venue:
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2005

Citing 6
Cited 3

Experiments in spoken document retrieval using phoneme n-grams

Speech Communication - Special issue on accessing information in spoken audio
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Vector-based natural language call routing

Computational Linguistics
Feature selection using linear classifier weights: interaction with classification models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Effective utterance classification with unsupervised phonotactic models

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

Music structure based vector space retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic discovery of topics and acoustic morphemes from speech

Computer Speech and Language
Direct posterior confidence for out-of-vocabulary spoken term detection

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acoustic activities in spoken languages. The strategy for acoustic vocabulary selection is studied by comparing different feature selection methods. With an appropriate acoustic vocabulary, a voice tokenizer converts a spoken document into a text-like document of acoustic words. Thus, a spoken document can be represented by a count vector, named a bag-of-sounds vector, which characterizes a spoken document's semantic domain. We study two phonotactic-semantic classifiers, the support vector machine classifier and the latent semantic analysis classifier, and their properties. The phonotactic-semantic framework constitutes a new paradigm in spoken document classification, as demonstrated by its success in the spoken language identification task. It achieves 18.2% error reduction over state-of-the-art benchmark performance on the 1996 NIST Language Recognition Evaluation database.