Audio Classification and Retrieval Using Wavelets and Gaussian Mixture Models

Authors:
Ching-Hua Chuan
Affiliations:
School of Computing, University of North Florida, Jacksonville, FL, USA
Venue:
International Journal of Multimedia Data Engineering & Management
Year:
2013

Citing 13
Cited 0

A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Ten lectures on wavelets

Ten lectures on wavelets
Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
A Wavelet Packet representation of audio signals for music genre classification using different ensemble and feature selection techniques

MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval

MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
The Shazam music recognition service

Communications of the ACM - Music information retrieval
Classification of audio signals using SVM and RBFNN

Expert Systems with Applications: An International Journal
Audio query by example using similarity measures between probability density functions of features

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on scalable audio-content analysis
Identifying the classical music composition of an unknown performance with wavelet dispersion vector and neural nets

Information Sciences: an International Journal
Audio Signal Feature Extraction and Classification Using Local Discriminant Bases

IEEE Transactions on Audio, Speech, and Language Processing
A Noise-Robust FFT-Based Auditory Spectrum With Application in Audio Classification

IEEE Transactions on Audio, Speech, and Language Processing
A speech/music discriminator based on RMS and zero-crossings

IEEE Transactions on Multimedia
Toward intelligent music information retrieval

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an audio classification and retrieval system using wavelets for extracting low-level acoustic features. The author performed multiple-level decomposition using discrete wavelet transform to extract acoustic features from audio recordings at different scales and times. The extracted features are then translated into a compact vector representation. Gaussian mixture models with expectation maximization algorithm are used to build models for audio classes and individual audio examples. The system is evaluated using three audio classification tasks: speech/music, male/female speech, and music genre. They also show how wavelets and Gaussian mixture models are used for class-based audio retrieval in two approaches: indexing using only wavelets versus indexing by Gaussian components. By evaluating the system through 10-fold cross-validation, the author shows the promising capability of wavelets and Gaussian mixture models for audio classification and retrieval. They also compare how parameters including frame size, wavelet level, Gaussian components, and sampling size affect performance in Gaussian models.