Robust language identification based on fused phonotactic information with MLKSFM pre-classifier

  • Authors:
  • Liang Wang;Eliathamby Ambikairajah;Eric H. C. Choi

  • Affiliations:
  • School of EE&Telecom, University of New South Wales and ATP Research Laboratory, National ICT Australia, NSW, Australia;School of EE&Telecom, University of New South Wales and ATP Research Laboratory, National ICT Australia, NSW, Australia;ATP Research Laboratory, National ICT Australia, NSW, Australia

  • Venue:
  • ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a novel language identification system which utilizes fused phonotactic information. The phase spectrum of speech signals is used with the magnitude spectrum in order to obtain a more robust feature representation. Parallel Broad Phone-class Recognition followed by Language Model (PBPRLM) is used in order to remove the bias of the likelihood scores introduced by the size inequality of phone inventories in traditional PPRLM systems. The likelihood scores from the MFCC-based and group-delay-based PPRLM and PBPRLM systems are fused together by using a Gaussian Mixture Model. Furthermore, a pre-classification based on Kohonen's map is used in order to maintain the system robustness while handling a large number of target languages. Using this proposed novel system we achieve an EER of 6.7% on the 2005 NIST LRE, and a LID recognition rate of 83.9% on a 22-language task.