A Theory for Multiresolution Signal Decomposition: The Wavelet Representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Ten lectures on wavelets
Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
The Shazam music recognition service
Communications of the ACM - Music information retrieval
Classification of audio signals using SVM and RBFNN
Expert Systems with Applications: An International Journal
Audio query by example using similarity measures between probability density functions of features
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on scalable audio-content analysis
Information Sciences: an International Journal
Audio Signal Feature Extraction and Classification Using Local Discriminant Bases
IEEE Transactions on Audio, Speech, and Language Processing
A Noise-Robust FFT-Based Auditory Spectrum With Application in Audio Classification
IEEE Transactions on Audio, Speech, and Language Processing
A speech/music discriminator based on RMS and zero-crossings
IEEE Transactions on Multimedia
Toward intelligent music information retrieval
IEEE Transactions on Multimedia
Hi-index | 0.00 |
This paper presents an audio classification and retrieval system using wavelets for extracting low-level acoustic features. The author performed multiple-level decomposition using discrete wavelet transform to extract acoustic features from audio recordings at different scales and times. The extracted features are then translated into a compact vector representation. Gaussian mixture models with expectation maximization algorithm are used to build models for audio classes and individual audio examples. The system is evaluated using three audio classification tasks: speech/music, male/female speech, and music genre. They also show how wavelets and Gaussian mixture models are used for class-based audio retrieval in two approaches: indexing using only wavelets versus indexing by Gaussian components. By evaluating the system through 10-fold cross-validation, the author shows the promising capability of wavelets and Gaussian mixture models for audio classification and retrieval. They also compare how parameters including frame size, wavelet level, Gaussian components, and sampling size affect performance in Gaussian models.