Experiments in spoken document retrieval using phoneme n-grams
Speech Communication - Special issue on accessing information in spoken audio
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Vector-based natural language call routing
Computational Linguistics
Feature selection using linear classifier weights: interaction with classification models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Effective utterance classification with unsupervised phonotactic models
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An introduction to kernel-based learning algorithms
IEEE Transactions on Neural Networks
Music structure based vector space retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic discovery of topics and acoustic morphemes from speech
Computer Speech and Language
Direct posterior confidence for out-of-vocabulary spoken term detection
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acoustic activities in spoken languages. The strategy for acoustic vocabulary selection is studied by comparing different feature selection methods. With an appropriate acoustic vocabulary, a voice tokenizer converts a spoken document into a text-like document of acoustic words. Thus, a spoken document can be represented by a count vector, named a bag-of-sounds vector, which characterizes a spoken document's semantic domain. We study two phonotactic-semantic classifiers, the support vector machine classifier and the latent semantic analysis classifier, and their properties. The phonotactic-semantic framework constitutes a new paradigm in spoken document classification, as demonstrated by its success in the spoken language identification task. It achieves 18.2% error reduction over state-of-the-art benchmark performance on the 1996 NIST Language Recognition Evaluation database.