Multi-stream Fusion for Speaker Classification

Authors:
Izhak Shafran
Affiliations:
Computer Science & Electrical Engineering, OGI School of Science & Engineering, Oregon Health & Science University (OHSU), 20000 NW Walker Rd, Beaverton, OR 97006,
Venue:
Speaker Classification I
Year:
2007

Citing 13
Cited 0

Advances in kernel methods: support vector learning

Advances in kernel methods: support vector learning
Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Automated Natural Spoken Dialog

Computer
Rational Kernels: Theory and Algorithms

The Journal of Machine Learning Research
A statistical framework for genomic data fusion

Bioinformatics
Audio-visual multimodal fusion for biometric person authentication and liveness verification

MMUI '05 Proceedings of the 2005 NICTA-HCSNet Multimodal User Interaction Workshop - Volume 57
Maximum likelihood discriminant feature spaces

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Noise adaptive stream weighting in audio-visual speech recognition

EURASIP Journal on Applied Signal Processing
Dynamic Bayesian networks for audio-visual speech recognition

EURASIP Journal on Applied Signal Processing
Acoustic Modelling Using Continuous Rational Kernels

Journal of VLSI Signal Processing Systems
A Bayesian approach to audio-visual speaker identification

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
EER of fixed and trainable fusion classifiers: a theoretical study with application to biometric authentication tasks

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accurate detection of speaker traits has clear benefits in improving speech interfaces, finding useful information in multi-media archives, and in medical applications. Humans infer a variety of traits, robustly and effortlessly, from available sources of information, which may include vision and gestures in addition to voice. This paper examines techniques for integrating information from multiple sources, which may be broadly categorized into those in feature space, model space, score space and kernel space. Integration in feature space and model space has been extensively studied in the context of audio-visual literature, and here we focus on score space and kernel space. There are large number of potential schemes for integration in kernel space, and here we examine a particular instance which can integrate both acoustic and lexical information for affect recognition. The example is taken from a widely-deployed real-world application. We compare the kernel-based classifier with other competing techniques and demonstrate how it can provide a general and flexible framework for detecting speaker characteristics.