Modeling nuisance variabilities with factor analysis for GMM-based audio pattern classification

Authors:
Driss Matrouf;Florian Verdet;Mickaël Rouvier;Jean-François Bonastre;Georges Linarès
Affiliations:
University of Avignon, Laboratoire Informatique d'Avignon, 84911 Avignon, France;University of Avignon, Laboratoire Informatique d'Avignon, 84911 Avignon, France and University of Fribourg, Department of Informatics, 1700 Fribourg, Switzerland;University of Avignon, Laboratoire Informatique d'Avignon, 84911 Avignon, France;University of Avignon, Laboratoire Informatique d'Avignon, 84911 Avignon, France and Institut Universitaire de France, France;University of Avignon, Laboratoire Informatique d'Avignon, 84911 Avignon, France
Venue:
Computer Speech and Language
Year:
2011

Citing 8
Cited 0

Horror film genre typing and scene labeling via audio analysis

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Video classification using spatial-temporal features and PCA

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Explicit modelling of session variability for speaker verification

Computer Speech and Language
A tutorial on text-independent speaker verification

EURASIP Journal on Applied Signal Processing
Comparison of scoring methods used in speaker recognition with Joint Factor Analysis

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A Study of Interspeaker Variability in Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Automatic Video Classification: A Survey of the Literature

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: Audio pattern classification represents a particular statistical classification task and includes, for example, speaker recognition, language recognition, emotion recognition, speech recognition and, recently, video genre classification. The feature being used in all these tasks is generally based on a short-term cepstral representation. The cepstral vectors contain at the same time useful information and nuisance variability, which are difficult to separate in this domain. Recently, in the context of GMM-based recognizers, a novel approach using a Factor Analysis (FA) paradigm has been proposed for decomposing the target model into a useful information component and a session variability component. This approach is called Joint Factor Analysis (JFA), since it models jointly the nuisance variability and the useful information, using the FA statistical method. The JFA approach has even been combined with Support Vector Machines, known for their discriminative power. In this article, we successfully apply this paradigm to three automatic audio processing applications: speaker verification, language recognition and video genre classification. This is done by applying the same process and using the same free software toolkit. We will show that this approach allows for a relative error reduction of over 50% in all the aforementioned audio processing tasks.