Audio classification in speech and music: a comparison between a statistical and a neural approach

Authors:
Alessandro Bugatti;Alessandra Flammini;Pierangelo Migliorati
Affiliations:
Department of Electronics for Automation, University of Brescia, Brescia, Italy;Department of Electronics for Automation, University of Brescia, Brescia, Italy;Department of Electronics for Automation, University of Brescia, Brescia, Italy
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2002

Citing 9
Cited 2

Fundamentals of speech recognition

Fundamentals of speech recognition
The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents

Multimedia Tools and Applications
Video Handling with Music and Speech Detection

IEEE MultiMedia
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Neural Networks: A Comprehensive Foundation (3rd Edition)

Neural Networks: A Comprehensive Foundation (3rd Edition)
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A comparison of features for speech, music discrimination

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Hierarchical classification of audio data for archiving and retrieving

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Describing multimedia documents in natural and semantic-driven ordered hierarchies

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04

Classification of similar impact sounds

ICISP'10 Proceedings of the 4th international conference on Image and signal processing
Hierarchical audio content classification system using an optimal feature selection algorithm

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron). In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used.