SPEECH/MUSIC DISCRIMINATION BASED ON WARPING TRANSFORMATION AND FUZZY LOGIC FOR INTELLIGENT AUDIO CODING

Authors:
Jose Enrique Munoz-Exposito;Sebastian Garcia Galan;Nicolas Ruiz Reyes;Pedro Vera Candeas
Affiliations:
Telecommunication Engineering Department, University of Jaen, Jaen, Spain;Telecommunication Engineering Department, University of Jaen, Jaen, Spain;Telecommunication Engineering Department, University of Jaen, Jaen, Spain;Telecommunication Engineering Department, University of Jaen, Jaen, Spain
Venue:
Applied Artificial Intelligence
Year:
2009

Citing 5
Cited 1

Video Handling with Music and Speech Detection

IEEE MultiMedia
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04

Content-Based Retrieval of Audio in News Broadcasts

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic discrimination of speech and music is an important tool in many multimedia applications. This article presents an evolutionary, fuzzy, rules-based speech/music discrimination approach for intelligent audio coding, which exploits only one simple feature, called Warped LPC-based Spectral Centroid (WLPC-SC). Comparison between WLPC-SC and the classical features proposed in the literature for audio classification is performed, aiming to assess the good discriminatory power of the proposed feature. The length of the vector for describing the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance, and skewness), which are then transformed to a new feature space, applying linear discriminant analysis (LDA), with the aim of increasing the classification accuracy percentage. The classification task is performed applying a support vector machine (SVM) to the features in the transformed space. The final decision is made by a fuzzy expert system, which improves the accuracy rate provided by the SVM, taking into account the audio labels assigned by this classifier to past audio frames. The accuracy rate improvement due to the fuzzy expert system is also reported. Experimental results reveal that our speech/music discriminator is robust and fast, making it suitable for intelligent audio coding.