Audio Signal Feature Extraction and Classification Using Local Discriminant Bases

Authors:
K. Umapathy;S. Krishnan;R. K. Rao
Affiliations:
Dept. of Electr. & Comput. Eng., Univ. of Western Ontario, London, Ont.;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 7

Classification of audio signals using SVM and RBFNN

Expert Systems with Applications: An International Journal
Classification of audio signals using AANN and GMM

Applied Soft Computing
Audio signal processing using time-frequency approaches: coding, classification, fingerprinting, and watermarking

EURASIP Journal on Advances in Signal Processing - Special issue on time-frequency analysis and its applications to multimedia signals
Pattern classification models for classifying and indexing audio signals

Engineering Applications of Artificial Intelligence
Environmental sound classification for scene recognition using local discriminant bases and HMM

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Audio Classification and Retrieval Using Wavelets and Gaussian Mixture Models

International Journal of Multimedia Data Engineering & Management
Audio classification with low-rank matrix representation features

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Audio feature extraction plays an important role in analyzing and characterizing audio content. Auditory scene analysis, content-based retrieval, indexing, and fingerprinting of audio are few of the applications that require efficient feature extraction. The key to extract strong features that characterize the complex nature of audio signals is to identify their discriminatory subspaces. In this paper, we propose an audio feature extraction and a multigroup classification scheme that focuses on identifying discriminatory time-frequency subspaces using the local discriminant bases (LDB) technique. Two dissimilarity measures were used in the process of selecting the LDB nodes and extracting features from them. The extracted features were then fed to a linear discriminant analysis-based classifier for a three-level hierarchical classification of audio signals into ten classes. In the first level, the audio signals were grouped into artificial and natural sounds. Each of the first level groups were subdivided to form the second level groups viz. instrumental, automobile, human, and nonhuman sounds. The third level was formed by subdividing the four groups of the second level into the final ten groups (drums, flute, piano, aircraft, helicopter, male, female, animals, birds and insects). A database of 213 audio signals were used in this study and an average classification accuracy of 83% for the first level (113 artificial and 100 natural sounds), 92% for the second level (73 instrumental and 40 automobile sounds; 40 human and 60 nonhuman sounds), and 89% for the third level (27 drums, 15 flute, and 31 piano sounds; 23 aircraft and 17 helicopter sounds; 20 male and 20 female speech; 20 animals, 20 birds and 20 insects sounds) were achieved. In addition to the above, a separate classification was also performed combining the LDB features with the mel-frequency cepstral coefficients. The average classification accuracies achieved using the combined features were 91% for the- - first level, 99% for the second level, and 95% for the third level