A target-oriented phonotactic front-end for spoken language recognition

Authors:
Rong Tong;Bin Ma;Haizhou Li;Eng Siong Chng
Affiliations:
Institute for Infocomm Research, Singapore and School of Computer Engineering, Nanyang Technological University, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore and School of Computer Engineering, Nanyang Technological University, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 7
Cited 0

Making large-scale support vector machine learning practical

Advances in kernel methods
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Automatic language recognition using acoustic features

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Compensation of Nuisance Factors for Speaker and Language Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Spoken Language Recognition Using Ensemble Classifiers

IEEE Transactions on Audio, Speech, and Language Processing
A Vector Space Modeling Approach to Spoken Language Identification

IEEE Transactions on Audio, Speech, and Language Processing
On Acoustic Diversification Front-End for Spoken Language Identification

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a strategy to optimize the phonotactic front-end for spoken language recognition. This is achieved by selecting a subset of phones from an existing phone recognizer's phone inventory such that only the phones that best discriminate each of the target languages are selected. Each such phone subset will be used to construct a target-oriented phone tokenizer (TOPT). In this study, we examine different approaches to construct such phone tokenizers for the front-end of a Parallel Phone Recognizers followed by Vector Space Modeling (PPR-VSM) system. We show that the target-oriented phone tokenizers derived from language-specific phone recognizers are more effective than the original parallel phone recognizers. Our experimental results also show that the target-oriented phone tokenizers derived from universal phone recognizers achieve better performance than those derived from language-specific phone recognizers. Using the proposed target-oriented phone tokenizers as the phonotactic front-end, the language recognition system performance is significantly improved without the need for additional training samples. We achieve an equal error rate (EER) of 1.27%, 1.42% and 2.73% on the NIST 1996, 2003 and 2007 LRE databases respectively for 30-s closed-set tests. This system is one of the subsystems in IIR's submission to NIST 2007 LRE.