A robust audio classification and segmentation method

Authors:
Lie Lu;Hao Jiang;HongJiang Zhang
Affiliations:
Microsoft Research, China, Beijing, PRC;Microsoft Research, China, Beijing, PRC;Microsoft Research, China, Beijing, PRC
Venue:
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Year:
2001

Citing 7
Cited 37

Automatic audio content analysis

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Audio Feature Extraction and Analysis for Scene Segmentation and Classification

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Towards robust features for classifying audio in the CueVideo system

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04

A utility framework for the automatic generation of audio-visual skims

Proceedings of the tenth ACM international conference on Multimedia
A user attention model for video summarization

Proceedings of the tenth ACM international conference on Multimedia
Speaker change detection and tracking in real-time news broadcasting analysis

Proceedings of the tenth ACM international conference on Multimedia
Support Vector Machine Learning for Music Discrimination

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Boosting Speech/Non-speech Classification Using Averaged Mel-Frequency Cepstrum Coefficients Features

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Semantic Video Retrieval Using Audio Analysis

CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
AVE: automated home video editing

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Semantic context detection based on hierarchical audio models

MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
Automated rich presentation of a semantic topic

Proceedings of the 13th annual ACM international conference on Multimedia
Documenting the pen-based interaction

WebMedia '05 Proceedings of the 11th Brazilian Symposium on Multimedia and the web
Acoustic environment classification

ACM Transactions on Speech and Language Processing (TSLP)
Audio classification based on maximum entropy model

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
The influence of cross-validation on video classification performance

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Semi-supervised learning for semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Using depth features to retrieve monocular video shots

Proceedings of the 6th ACM international conference on Image and video retrieval
Musical instrument timbres classification with spectral features

EURASIP Journal on Applied Signal Processing
Perceptual audio hashing functions

EURASIP Journal on Applied Signal Processing
Speaker separation and tracking system

EURASIP Journal on Applied Signal Processing
Feature fusion and redundancy pruning for rush video summarization

Proceedings of the international workshop on TRECVID video summarization
Adapting appearance models of semantic concepts to particular videos via transductive learning

Proceedings of the international workshop on Workshop on multimedia information retrieval
Cross-lingual audio-to-text alignment for multimedia content management

Decision Support Systems
Affective ranking of movie scenes using physiological signals and content analysis

MS '08 Proceedings of the 2nd ACM workshop on Multimedia semantics
Episode-constrained cross-validation in video concept retrieval

IEEE Transactions on Multimedia
Audio classification based on adaptive partitioning

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Facilitating meetings with playful feedback

CHI '10 Extended Abstracts on Human Factors in Computing Systems
Audio-assisted scene segmentation for story browsing

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Speech/music discrimination using Mel-cepstrum modulation energy

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Digital video scenes identification using audiovisual features

WebMedia '09 Proceedings of the XV Brazilian Symposium on Multimedia and the Web
Hierarchical audio content classification system using an optimal feature selection algorithm

Multimedia Tools and Applications
Ubiquitous meeting facilitator with playful real-time user interface

UIC'11 Proceedings of the 8th international conference on Ubiquitous intelligence and computing
Speech/music discrimination in audio podcast using structural segmentation and timbre recognition

CMMR'10 Proceedings of the 7th international conference on Exploring music contents
Effective TV advertising block division into single commercials method

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
Hierarchical video summarization based on video structure and highlight

SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
First steps to an audio ontology-based classifier for telemedicine

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Effective video scene detection approach based on cinematic rules

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Structural and semantic modeling of audio for content-based querying and browsing

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Speech/music discrimination via energy density analysis

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a robust algorithm for audio classification that is capable of segmenting and classifying an audio stream into speech, music, environment sound and silence. Audio classification is processed in two steps, which makes it suitable for different applications. The first step of the classification is speech and non-speech discrimination. In this step, a novel algorithm based on KNN and LSP VQ is presented. The second step further divides non-speech class into music, environment sounds and silence with a rule based classification scheme. Some new features such as the noise frame ratio and band periodicity are introduced and discussed in detail. Our experiments in the context of video structure parsing have shown the algorithms produce very satisfactory results.