Using the Fisher kernel method for Web audio classification

Authors:
P. J. Moreno;R. Rifkin
Affiliations:
Cambridge Res. Lab., Compaq Comput. Corp., Cambridge, MA, USA;-
Venue:
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Year:
2000

Citing 0
Cited 8

A singer identification technique for content-based classification of MP3 music objects

Proceedings of the eleventh international conference on Information and knowledge management
Virtual microphones for multichannel audio resynthesis

EURASIP Journal on Applied Signal Processing
A multi-class classification strategy for Fisher scores: Application to signer independent sign language recognition

Pattern Recognition
Online signature verification with support vector machines based on LCSS kernel functions

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on gait analysis
Multimedia data mining: state of the art and challenges

Multimedia Tools and Applications
Kernels for longitudinal data with variable sequence length and sampling intervals

Neural Computation
Toward a sound analysis system for telemedicine

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Fisher kernel based relevance feedback for multimodal video retrieval

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the multimedia content of the Web increases techniques to automatically classify this content become more important. We present a system to classify audio files collected from the Web. The system classifies any audio file as belonging to one of three categories: speech, music and other. To classify the audio files, we use the technique of Fisher kernels. The technique as proposed by Jaakkola (1998) assumes a probabilistic generative model for the data, in our case a Gaussian mixture model. Then a discriminative classifier uses the GMM as an intermediate step to produce appropriate feature vectors. Support vector machines are our choice of discriminative classifier. We present classification results on a collection of more than 173 hours of Web audio randomly collected. We believe our results represent one of the first realistic studies of audio classification performance on found data. Our final system yielded a classification rate of 81.8%.