Fusion of acoustic and tokenization features for speaker recognition

Authors:
Rong Tong;Bin Ma;Kong-Aik Lee;Changhuai You;Donglai Zhu;Tomi Kinnunen;Hanwu Sun;Minghui Dong;Eng-Siong Chng;Haizhou Li
Affiliations:
Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore;Institute for Infocomm Research, Singapore
Venue:
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Year:
2006

Citing 2
Cited 3

SVMTorch: support vector machines for large-scale regression problems

The Journal of Machine Learning Research
A phonotactic language model for spoken language identification

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Automatic voice activity detection in different speech applications

Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia and workshop
Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification

Pattern Recognition Letters
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes our recent efforts in exploring effective discriminative features for speaker recognition. Recent researches have indicated that the appropriate fusion of features is critical to improve the performance of speaker recognition system. In this paper we describe our approaches for the NIST 2006 Speaker Recognition Evaluation. Our system integrated the cepstral GMM modeling, cepstral SVM modeling and tokenization at both phone level and frame level. The experimental results on both NIST 2005 SRE corpus and NIST 2006 SRE corpus are presented. The fused system achieved 8.14% equal error rate on 1conv4w-1conv4w test condition of the NIST 2006 SRE.