Audio Signal Feature Extraction and Classification Using Local Discriminant Bases

  • Authors:
  • K. Umapathy;S. Krishnan;R. K. Rao

  • Affiliations:
  • Dept. of Electr. & Comput. Eng., Univ. of Western Ontario, London, Ont.;-;-

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Audio feature extraction plays an important role in analyzing and characterizing audio content. Auditory scene analysis, content-based retrieval, indexing, and fingerprinting of audio are few of the applications that require efficient feature extraction. The key to extract strong features that characterize the complex nature of audio signals is to identify their discriminatory subspaces. In this paper, we propose an audio feature extraction and a multigroup classification scheme that focuses on identifying discriminatory time-frequency subspaces using the local discriminant bases (LDB) technique. Two dissimilarity measures were used in the process of selecting the LDB nodes and extracting features from them. The extracted features were then fed to a linear discriminant analysis-based classifier for a three-level hierarchical classification of audio signals into ten classes. In the first level, the audio signals were grouped into artificial and natural sounds. Each of the first level groups were subdivided to form the second level groups viz. instrumental, automobile, human, and nonhuman sounds. The third level was formed by subdividing the four groups of the second level into the final ten groups (drums, flute, piano, aircraft, helicopter, male, female, animals, birds and insects). A database of 213 audio signals were used in this study and an average classification accuracy of 83% for the first level (113 artificial and 100 natural sounds), 92% for the second level (73 instrumental and 40 automobile sounds; 40 human and 60 nonhuman sounds), and 89% for the third level (27 drums, 15 flute, and 31 piano sounds; 23 aircraft and 17 helicopter sounds; 20 male and 20 female speech; 20 animals, 20 birds and 20 insects sounds) were achieved. In addition to the above, a separate classification was also performed combining the LDB features with the mel-frequency cepstral coefficients. The average classification accuracies achieved using the combined features were 91% for the- - first level, 99% for the second level, and 95% for the third level