A fast audio classification from MPEG coded data

Authors:
Y. Nakajima;Yang Lu;M. Sugano;A. Yoneyama;H. Yamagihara;A. Kurematsu
Affiliations:
KDD R, Saitama, Japan;-;-;-;-;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Year:
1999

Citing 0
Cited 9

A compressed domain beat detector using MP3 audio bitstreams

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
A Neural Multi-expert Classification System for MPEG Audio Segmentation

ICAPR '01 Proceedings of the Second International Conference on Advances in Pattern Recognition
A Survey of MPEG-1 Audio, Video and Semantic Analysis Techniques

Multimedia Tools and Applications
Audio-Based Shot Classification for Audiovisual Indexing Using PCA, MGD and Fuzzy Algorithm

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Audio segmentation in AAC domain for content analysis

WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Audio signal representations for indexing in the transform domain

IEEE Transactions on Audio, Speech, and Language Processing
On similarity search in audio signals using adaptive sparse approximations

AMR'09 Proceedings of the 7th international conference on Adaptive multimedia retrieval: understanding media and adapting to the user
Shot classification and scene segmentation based on MPEG compressed movie analysis

PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part I
Video story segmentation and its application to personal video recorders

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Audio information classification becomes a very important task for such purposes as automatic keyword spotting and other content-based audio-visual query systems. In this paper, we describe a fast and accurate audio data classification method on the MPEG coded data domain. Firstly silent segments are detected using a robust approach for different recording conditions. Then the non-silent segments are classified into three types, music, speech, and applause using temporal density, bandwidth and center frequency of subband energy. In order to be robust for a variety of audio sources as much as possible, we use Bayes discriminant function for multivariate Gaussian distribution instead of manually adjusting a threshold for each discriminator. In the experiment, every one-second of MPEG audio data is classified and about 90% of audio and speech segments have been successfully detected. As for the detection speed, less than 20% of MPEG audio decoding processing power is required.