Automatic transcription of Broadcast News
Speech Communication - Special issue on automatic transcription of broadcast news data
Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing
Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing
Audio classification in speech and music: a comparison between a statistical and a neural approach
EURASIP Journal on Applied Signal Processing - Image analysis for multimedia interactive services - part I
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Voice-based gender identification in multimedia applications
Journal of Intelligent Information Systems - Special issue: Intelligent multimedia applications
Gender identification using a general audio classifier
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Pitch correlogram clustering for fast speaker identification
EURASIP Journal on Applied Signal Processing
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information
IEEE Transactions on Computers
Detection of speech and music based on spectral tracking
Speech Communication
Online speech/music segmentation based on the variance mean of filter bank energy
EURASIP Journal on Advances in Signal Processing
A two level strategy for audio segmentation
Digital Signal Processing
Hi-index | 0.00 |
This paper presents novel features and an architecture for an automatic on-line acoustic classification and segmentation system. The system includes speech/non-speech segmentation (with the emphasis on accurate speech/music segmentation), gender segmentation, and speech bandwidth segmentation. This automatic segmentation system can be easily integrated into an automatic continuous speech recognition system, where information about individual acoustic segments can be used for acoustic model selection and adaptation, or as additional information for rich transcription output. Acoustic model adaptation can improve the speech recognition rate and additional information in rich transcription can be useful when searching for some certain events or circumstances (male speaker talking over the phone line, etc.). For speech/non-speech segmentation we propose a new set of features, which are based on an energy variance in a narrow frequency sub-band, called EVFB (Energy Variance of Filter Bank). The proposed features also prove to be an efficient discriminator between speech and music. Segmentation cross-test results show that EVFB features prove to be more robust than MFCC features. Two new features (modified spectral roll-off and high-frequency energy variance) are also proposed for speech bandwidth classification and segmentation. The results show a good and robust performance by the automatic on-line acoustic segmentation system. All experiments and tests were performed on a radio broadcast database and a Slovenian BNSI Broadcast News database. Integration of the automatic on-line acoustic segmentation system into a continuous speech recognition system based on MFCC (mel-frequency cepstral coefficients) features requires only a small additional computational cost because many of the proposed system@?s feature calculation procedures are common to the MFCC features calculation procedure.