Robust singing detection in speech/music discriminator design

Authors:
Wu Chou;Liang Gu
Affiliations:
Lucent Technol. Bell Labs., Murray Hill, NJ, USA;-
Venue:
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Year:
2001

Citing 0
Cited 8

A singer identification technique for content-based classification of MP3 music objects

Proceedings of the eleventh international conference on Information and knowledge management
Singing voice detection in popular music

Proceedings of the 12th annual ACM international conference on Multimedia
Automatic classification of speech and music using neural networks

Proceedings of the 2nd ACM international workshop on Multimedia databases
Automatic singer identification

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Detection of speech and music based on spectral tracking

Speech Communication
A wavelet-based parameterization for speech/music discrimination

Computer Speech and Language
Classification of similar impact sounds

ICISP'10 Proceedings of the 4th international conference on Image and signal processing
Context-Aware features for singing voice detection in polyphonic music

AMR'11 Proceedings of the 9th international conference on Adaptive Multimedia Retrieval: large-scale multimedia retrieval and evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, an approach for robust singing signal detection in speech/music discrimination is proposed and applied to applications of audio indexing. Conventional approaches in speech/music discrimination can provide reasonable performance with regular music signals but often perform poorly with singing segments. This is due mainly to the fact that speech and singing signals are extremely close and traditional features used in speech recognition do not provide a reliable cue for speech and singing signal discrimination. In order to improve the robustness of speech/music discrimination, a new set of features derived from the harmonic coefficient and its 4 Hz modulation values are developed in this paper, and these new features provide additional and reliable cues to separate speech from singing. In addition, a rule-based post-filtering scheme is also described which leads to further improvements in speech/music discrimination. Source-independent audio indexing experiments on the PBS Skills database indicate that the proposed approach can greatly reduce the classification error rate on singing segments in the audio stream. Comparing with existing approaches, the overall segmentation error rate is reduced by more than 30%, averaged over all shows in the database.