A speech/music discriminator based on RMS and zero-crossings

Authors:
C. Panagiotakis;G. Tziritas
Affiliations:
Dept. of Comput. Sci., Univ. of Crete, Heraklion, Greece;-
Venue:
IEEE Transactions on Multimedia
Year:
2005

Citing 0
Cited 27

Automatic classification of speech and music using neural networks

Proceedings of the 2nd ACM international workshop on Multimedia databases
PebbleBox and CrumbleBag: tactile interfaces for granular synthesis

NIME '04 Proceedings of the 2004 conference on New interfaces for musical expression
Tiling slideshow

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Semantic-event based analysis and segmentation of wedding ceremony videos

Proceedings of the international workshop on Workshop on multimedia information retrieval
New speech/music discrimination approach based on fundamental frequency estimation

Multimedia Tools and Applications
Classification of audio signals using SVM and RBFNN

Expert Systems with Applications: An International Journal
An Efficient Approach for Classification of Speech and Music

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
A wavelet-based parameterization for speech/music discrimination

Computer Speech and Language
Noise robust features for speech/music discrimination in real-time telecommunication

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
A two phase method for general audio segmentation

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Signal segmentation and modelling based on equipartition principle

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Two-stage cascaded classification approach based on genetic fuzzy learning for speech/music discrimination

Engineering Applications of Artificial Intelligence
Online speech/music segmentation based on the variance mean of filter bank energy

EURASIP Journal on Advances in Signal Processing
Classification of audio signals using AANN and GMM

Applied Soft Computing
Audio signal processing using time-frequency approaches: coding, classification, fingerprinting, and watermarking

EURASIP Journal on Advances in Signal Processing - Special issue on time-frequency analysis and its applications to multimedia signals
Pattern classification models for classifying and indexing audio signals

Engineering Applications of Artificial Intelligence
Hierarchical audio content classification system using an optimal feature selection algorithm

Multimedia Tools and Applications
Scene segmentation of wedding party videos by scenario-based matching with example videos

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Audio content analysis for understanding structures of scene in video

ICIC'06 Proceedings of the 2006 international conference on Intelligent Computing - Volume Part I
A multi-class method for detecting audio events in news broadcasts

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
Improvement of commercial boundary detection using audiovisual features

PCM'05 Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part I
Mining movies to extract song sequences

Proceedings of the Eleventh International Workshop on Multimedia Data Mining
Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

International Journal of Speech Technology
Mining movies for song sequences with video based music genre identification system

Information Processing and Management: an International Journal
DemoCut: generating concise instructional videos for physical demonstrations

Proceedings of the 26th annual ACM symposium on User interface software and technology
Audio Classification and Retrieval Using Wavelets and Gaussian Mixture Models

International Journal of Multimedia Data Engineering & Management
Mining movie archives for song sequences

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the last several years, major efforts have been made to develop methods for extracting information from audiovisual media, in order that they may be stored and retrieved in databases automatically, based on their content. In this work we deal with the characterization of an audio signal, which may be part of a larger audiovisual system or may be autonomous, as for example in the case of an audio recording stored digitally on disk. Our goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music. Among the system's requirements are its processing speed and its ability to function in a real-time environment with a small responding delay. Because of the restriction to two classes, the characteristics that are extracted are considerably reduced and moreover the required computations are straightforward. Experimental results show that efficiency is exceptionally good, without sacrificing performance. Segmentation is based on mean signal amplitude distribution, whereas classification utilizes an additional characteristic related to the frequency. The classification algorithm may be used either in conjunction with the segmentation algorithm, in which case it verifies or refutes a music-speech or speech-music change, or autonomously, with given audio segments. The basic characteristics are computed in 20 ms intervals, resulting in the segments' limits being specified within an accuracy of 20 ms. The smallest segment length is one second. The segmentation and classification algorithms were benchmarked on a large data set, with correct segmentation about 97% of the time and correct classification about 95%.