A Hierarchical System Design for Language Identification

Authors:
Haipeng Wang;Xiang Xiao;Xiang Zhang;Jianping Zhang;Yonghong Yan
Affiliations:
-;-;-;-;-
Venue:
ISISE '09 Proceedings of the 2009 Second International Symposium on Information Science and Engineering
Year:
2009

Citing 0
Cited 1

Improved N-grams approach for web page language identification

Transactions on computational collective intelligence V

Quantified Score

Hi-index	0.02

Visualization

Abstract

Token-based approaches have proven quite effective for spoken language identification (LID). Traditionally, Speech utterances are first decoded into token sequences, and then LID tasks are performed on these token sequences by either n-gram language models or support vector machines. In this paper, we propose a hierarchical system design, which utilizes a group of bayesian logistic regression models as score generators. Score generators are then followed by a score merger, which outputs the final identification results. Experiments conducted on the NISR LRE 2007 databases demonstrate that the proposed approach achieves quite competitive performance compared to other traditional token-based methods.