Non-parametric techniques for pitch-scale and time-scale modification of speech
Speech Communication - Special issue: voice conversion: state of the art and perspectives
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Sound onset detection by applying psychoacoustic knowledge
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Shape invariant time-scale and pitch modification of speech
IEEE Transactions on Signal Processing
Vowel Onset Point Detection Using Source, Spectral Peaks, and Modulation Spectrum Energies
IEEE Transactions on Audio, Speech, and Language Processing
Prosody modification using instants of significant excitation
IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals
IEEE Transactions on Audio, Speech, and Language Processing
Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients
IEEE Transactions on Audio, Speech, and Language Processing
Identification of Indian languages using multi-level spectral and prosodic features
International Journal of Speech Technology
Hi-index | 0.00 |
In this paper, a non-uniform time scale modification (TSM) method is proposed for increasing or decreasing speech rate. The proposed method modifies the durations of vowel and pause segments by different modification factors. Vowel segments are modified by factors based on their identities, and pause segments by uniform factors based on the desired speaking rate. Consonant and transition (consonant-to-vowel) segments are not modified in the proposed TSM. These modification factors are derived from the analysis of slow and fast speech collected from professional radio artists. In the proposed TSM method, vowel onset points (VOPs) are used to mark the consonant, transition and vowel regions, and instants of significant excitation (ISE) are used to perform TSM as required. The VOPs indicate the instants at which the onsets of vowels take place. The ISE, also known as epochs, indicate the instants of glottal closure during voiced speech, and some random excitations such as burst onset during non-voiced speech. In this work, VOPs are determined using multiple sources of evidence from excitation source, spectral peaks, modulation spectrum and uniformity in epoch intervals. The ISEs are determined using a zero-frequency filter method. The performance of the proposed non-uniform TSM scheme is compared with uniform and existing non-uniform TSM schemes using epoch and time domain pitch synchronous overlap and add (TD-PSOLA) methods.