Knowledge-based analysis of speech mixed with sporadic environmental sounds
Computational auditory scene analysis
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Real-time discrimination of broadcast speech/music
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Separation of harmonic sound sources using sinusoidal modeling
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Robust singing detection in speech/music discriminator design
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Sinusoidal model based on instantaneous frequency attractors
IEEE Transactions on Audio, Speech, and Language Processing
A wavelet-based parameterization for speech/music discrimination
Computer Speech and Language
Online speech/music segmentation based on the variance mean of filter bank energy
EURASIP Journal on Advances in Signal Processing
Digital Signal Processing
Hi-index | 0.00 |
How to deal with sounds that include spectrally and temporally complex signals such as speech and music remains a problem in real-world audio information processing. We have devised (1) a classification method based on sinusoidal trajectories for speech and music and (2) a detection method based on (1) for speech with background music. Sinusoidal trajectories represent the temporal characteristics of each category of sounds such as speech, singing voice and musical instrument. From the trajectories, 20 temporal features are extracted and used to classify sound segments into the categories by using statistical classifiers. The average F"1 measure of the classification of nonmixed sounds was 0.939, which might be sufficiently high to apply to subsequent detection of sound categories in a mixed sound. To handle the temporal overlapping of sounds, we also developed an optimal spectral tracking algorithm with low computational complexity; it is based on dynamic programming (DP) with iterative improvement for the sinusoidal decomposition of signals. The classification and detection of a temporal mixture of speech and music are performed by a statistical integration of the temporal features of their trajectories and the optimization of the combination of their categories. The detection method was experimentally evaluated using 400 samples of mixed sounds, and the average of the narrow-band correlation coefficients and improvement in the segmental signal-to-noise ratio (SNR) were 0.55 and +5.67dB, respectively, which show effectiveness of the proposed detection method.