Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features

Authors:
Maria Markaki;Yannis Stylianou
Affiliations:
Computer Science Department, University of Crete, Greece;Computer Science Department, University of Crete, Greece and Institute of Computer Science, FORTH, Greece
Venue:
Speech Communication
Year:
2011

Citing 11
Cited 2

A Multilinear Singular Value Decomposition

SIAM Journal on Matrix Analysis and Applications
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Joint acoustic and modulation frequency

EURASIP Journal on Applied Signal Processing
Score normalization in multimodal biometric systems

Pattern Recognition
Modulation-scale analysis for content identification

IEEE Transactions on Signal Processing - Part II
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

IEEE Transactions on Audio, Speech, and Language Processing
Multistage speaker diarization of broadcast news

IEEE Transactions on Audio, Speech, and Language Processing
An overview of automatic speaker diarization systems

IEEE Transactions on Audio, Speech, and Language Processing

Investigation of broadcast-audio semantic analysis scenarios employing radio-programme-adaptive pattern classification

Speech Communication
A New Truncation Strategy for the Higher-Order Singular Value Decomposition

SIAM Journal on Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In audio content analysis, the discrimination of speech and non-speech is the first processing step before speaker segmentation and recognition, or speech transcription. Speech/non-speech segmentation algorithms usually consist of a frame-based scoring phase using MFCC features, combined with a smoothing phase. In this paper, a content based speech discrimination algorithm is designed to exploit long-term information inherent in modulation spectrum. In order to address the varying degrees of redundancy and discriminative power of the acoustic and modulation frequency subspaces, we first employ a generalization of SVD to tensors (Higher Order SVD) to reduce dimensions. Projection of modulation spectral features on the principal axes with the higher energy in each subspace results in a compact set of features with minimum redundancy. We further estimate the relevance of these projections to speech discrimination based on mutual information to the target class. This system is built upon a segment-based SVM classifier in order to recognize the presence of voice activity in audio signal. Detection experiments using Greek and US English broadcast news data composed of many speakers in various acoustic conditions suggest that the system provides complementary information to state-of-the-art mel-cepstral features.