Speaker Verification Using Support Vector Machines and High-Level Features

Authors:
W. M. Campbell;J. P. Campbell;T. P. Gleason;D. A. Reynolds;Wade Shen
Affiliations:
Massachusetts Inst. of Technol., Lexington;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 7

Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Ubiquitous and Robust Text-Independent Speaker Recognition for Home Automation Digital Life

UIC '08 Proceedings of the 5th international conference on Ubiquitous Intelligence and Computing
The likelihood ratio decision criterion for nuisance attribute projection in GMM speaker verification

EURASIP Journal on Advances in Signal Processing
Supplier selection based on hierarchical potential support vector machine

Expert Systems with Applications: An International Journal
Data-driven background dataset selection for SVM-based speaker verification

IEEE Transactions on Audio, Speech, and Language Processing
Development and evaluation of online text-independent speaker verification system for remote person authentication

International Journal of Speech Technology
VLSI design of an SVM learning core on sequential minimal optimization algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker in current corpora, high-level systems now have the amount of data needed to sufficiently characterize a speaker. Although a significant amount of work has been done in finding novel high-level features, less work has been done on modeling these features. We describe a method of speaker modeling based upon support vector machines. Current high-level feature extraction produces sequences or lattices of tokens for a given conversation side. These sequences can be converted to counts and then frequencies of n-gram for a given conversation side. We use support vector machine modeling of these n-gram frequencies for speaker verification. We derive a new kernel based upon linearizing a log likelihood ratio scoring system. Generalizations of this method are shown to produce excellent results on a variety of high-level features. We demonstrate that our methods produce results significantly better than standard log-likelihood ratio modeling. We also demonstrate that our system can perform well in conjunction with standard cesptral speaker recognition systems.