Audio Feature Extraction and Analysis for Scene Segmentation and Classification

  • Authors:
  • Zhu Liu;Yao Wang;Tsuhan Chen

  • Affiliations:
  • Polytechnic University, Brooklyn, NY 11201;Polytechnic University, Brooklyn, NY 11201;Carnegie Mellon University, Pittsburgh, PA 15213

  • Venue:
  • Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Understanding of the scene content of a video sequence isvery important for content-based indexing and retrieval of multimediadatabases. Research in this area in the past several years hasfocused on the use of speech recognition and image analysistechniques. As a complimentary effort to the prior work, we havefocused on using the associated audio information (mainly thenonspeech portion) for video scene analysis. As an example, weconsider the problem of discriminating five types of TV programs,namely commercials, basketball games, football games, news reports,and weather forecasts. A set of low-level audio features are proposedfor characterizing semantic contents of short audio clips. The linearseparability of different classes under the proposed feature space isexamined using a clustering analysis. The effective features areidentified by evaluating the intracluster and intercluster scatteringmatrices of the feature space. Using these features, a neural netclassifier was successful in separating the above five types of TVprograms. By evaluating the changes between the feature vectors ofadjacent clips, we also can identify scene breaks in an audiosequence quite accurately. These results demonstrate the capabilityof the proposed audio features for characterizing the semanticcontent of an audio sequence.