Probabilistic scoring using decision trees for fast and scalable speaker recognition

Authors:
Gilles Gonon;Frédéric Bimbot;Rémi Gribonval
Affiliations:
IRISA/METISS (CNRS & INRIA), Campus Universitaire de Beaulieu, 35042 Rennes Cedex, France;IRISA/METISS (CNRS & INRIA), Campus Universitaire de Beaulieu, 35042 Rennes Cedex, France;IRISA/METISS (CNRS & INRIA), Campus Universitaire de Beaulieu, 35042 Rennes Cedex, France
Venue:
Speech Communication
Year:
2009

Citing 5
Cited 1

A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Machine Learning
Fixed-point GMM-based speaker verification over mobile embedded system

WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
Techniques to achieve an accurate real-time large-vocabulary speech recognition system

HLT '94 Proceedings of the workshop on Human Language Technology
The bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A system for induction of oblique decision trees

Journal of Artificial Intelligence Research

Fuzzy binary decision tree for biometric based personal authentication

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the context of fast and low cost speaker recognition, this article investigates several techniques based on decision trees. A new approach is introduced where the trees are used to estimate a score function rather than returning a decision among classes. This technique is developed to approximate the GMM log-likelihood ratio (LLR) score function. On top of this approach, different solutions are derived to improve the accuracy of the proposed trees. The first one studies the quantization of the LLR function to create classification trees on the LLR values. The second one makes use of knowledge on the GMM distribution of the acoustic features in order to build oblique trees. A third extension consists in using a low-complexity score function in each of the tree leaves. Series of comparative experiments are performed on the NIST 2005 speaker recognition evaluation data in order to evaluate the impact of the proposed improvements in terms of efficiency, execution time and algorithmic complexity. Considering a baseline system with an Equal Error Rate (EER) of 9.6% on the NIST 2005 evaluation, the best tree-based configuration achieves an EER of 12.9%, with a computational cost adapted to embedded devices and an execution time suitable for real-time speaker identification.