Prosody-based automatic segmentation of speech into sentences and topics
Speech Communication - Special issue on accessing information in spoken audio
AANN: an alternative to GMM for pattern recognition
Neural Networks
Comparing Prosody Across Many Languages
Comparing Prosody Across Many Languages
Neural Networks: A Comprehensive Foundation (3rd Edition)
Neural Networks: A Comprehensive Foundation (3rd Edition)
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Unfolding speaker clustering potential: a biomimetic approach
MM '09 Proceedings of the 17th ACM international conference on Multimedia
An overview of text-independent speaker recognition: From features to supervectors
Speech Communication
International Journal of Speech Technology
Application of prosody models for developing speech systems in Indian languages
International Journal of Speech Technology
Robust Arabic speech recognition in noisy environments using prosodic features and formant
International Journal of Speech Technology
Speaker verification using excitation source information
International Journal of Speech Technology
Robust arabic multi-stream speech recognition system in noisy environment
ICISP'12 Proceedings of the 5th international conference on Image and Signal Processing
Continuous emotion recognition with phonetic syllables
Speech Communication
Identification of Indian languages using multi-level spectral and prosodic features
International Journal of Speech Technology
Pitch synchronous and glottal closure based speech analysis for language recognition
International Journal of Speech Technology
Hi-index | 0.00 |
In this paper, we propose a new approach for extracting and representing prosodic features directly from the speech signal. We hypothesize that prosody is linked to linguistic units such as syllables, and it is manifested in terms of changes in measurable parameters such as fundamental frequency (F"0), duration and energy. In this work, syllable-like unit is chosen as the basic unit for representing the prosodic characteristics. Approximate segmentation of continuous speech into syllable-like units is obtained by locating the vowel onset points (VOP) automatically. The knowledge of the VOPs serve as reference for extracting prosodic features from the speech signal. Quantitative parameters are used to represent F"0 and energy contour in each region between two consecutive VOPs. Prosodic features extracted using this approach may be useful in applications such as recognition of language or speaker, where explicit phoneme/syllable boundaries are not easily available. The effectiveness of the derived prosodic features for language and speaker recognition is evaluated in the case of NIST language recognition evaluation 2003 and the extended data task of NIST speaker recognition evaluation 2003, respectively.