Robust language identification based on fused phonotactic information with MLKSFM pre-classifier

Authors:
Liang Wang;Eliathamby Ambikairajah;Eric H. C. Choi
Affiliations:
School of EE&Telecom, University of New South Wales and ATP Research Laboratory, National ICT Australia, NSW, Australia;School of EE&Telecom, University of New South Wales and ATP Research Laboratory, National ICT Australia, NSW, Australia;ATP Research Laboratory, National ICT Australia, NSW, Australia
Venue:
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Year:
2009

Citing 2
Cited 4

Development of an approach to language identification based on language-dependent phone recognition

Development of an approach to language identification based on language-dependent phone recognition
Automatic Prosodic Variations Modeling for Language and Dialect Discrimination

IEEE Transactions on Audio, Speech, and Language Processing

Arabic script web page language identifications using decision tree neural networks

Pattern Recognition
Hybrid approach for language identification oriented to multilingual speech recognition in the basque context

HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part I
Semantic speech recognition in the Basque context Part II: language identification for under-resourced languages

International Journal of Speech Technology
Semantic speech recognition in the Basque context Part I: cross-lingual approaches

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a novel language identification system which utilizes fused phonotactic information. The phase spectrum of speech signals is used with the magnitude spectrum in order to obtain a more robust feature representation. Parallel Broad Phone-class Recognition followed by Language Model (PBPRLM) is used in order to remove the bias of the likelihood scores introduced by the size inequality of phone inventories in traditional PPRLM systems. The likelihood scores from the MFCC-based and group-delay-based PPRLM and PBPRLM systems are fused together by using a Gaussian Mixture Model. Furthermore, a pre-classification based on Kohonen's map is used in order to maintain the system robustness while handling a large number of target languages. Using this proposed novel system we achieve an EER of 6.7% on the 2005 NIST LRE, and a LID recognition rate of 83.9% on a 22-language task.