A phonotactic language model for spoken language identification

Authors:
Haizhou Li;Bin Ma
Affiliations:
Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore
Venue:
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Year:
2005

Citing 4
Cited 6

Using linear algebra for intelligent information retrieval

SIAM Review
Experiments in spoken document retrieval using phoneme n-grams

Speech Communication - Special issue on accessing information in spoken audio
Vector-based natural language call routing

Computational Linguistics
Automatic language recognition using acoustic features

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference

Text- and speech-based phonotactic models for spoken language identification of Basque and Spanish

Pattern Recognition Letters
Phone-segments based language identification for Spanish, Basque and English

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Beat space segmentation and octave scale cepstral feature for sung language recognition in pop music

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Fusion of acoustic and tokenization features for speaker recognition

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Hybrid approach for language identification oriented to multilingual speech recognition in the basque context

HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part I
Semantic speech recognition in the Basque context Part II: language identification for under-resourced languages

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have established a phonotactic language model as the solution to spoken language identification (LID). In this framework, we define a single set of acoustic tokens to represent the acoustic activities in the world's spoken languages. A voice tokenizer converts a spoken document into a text-like document of acoustic tokens. Thus a spoken document can be represented by a count vector of acoustic tokens and token n-grams in the vector space. We apply latent semantic analysis to the vectors, in the same way that it is applied in information retrieval, in order to capture salient phonotactics present in spoken documents. The vector space modeling of spoken utterances constitutes a paradigm shift in LID technology and has proven to be very successful. It presents a 12.4% error rate reduction over one of the best reported results on the 1996 NIST Language Recognition Evaluation database.