Non-uniform time scale modification using instants of significant excitation and vowel onset points

Authors:
K. Sreenivasa Rao;Anil Kumar Vuppala
Affiliations:
-;-
Venue:
Speech Communication
Year:
2013

Citing 12
Cited 1

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Speech Communication
Non-parametric techniques for pitch-scale and time-scale modification of speech

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications

Signal Processing
Automatic audio morphing

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Sound onset detection by applying psychoacoustic knowledge

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Suppression of phasiness for time-scale modifications of speech signals based on a shape invariance property

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Short Communication: Duration modification using glottal closure instants and vowel onset points

Speech Communication
Shape invariant time-scale and pitch modification of speech

IEEE Transactions on Signal Processing
Vowel Onset Point Detection Using Source, Spectral Peaks, and Modulation Spectrum Energies

IEEE Transactions on Audio, Speech, and Language Processing
Prosody modification using instants of significant excitation

IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing
Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients

IEEE Transactions on Audio, Speech, and Language Processing

Identification of Indian languages using multi-level spectral and prosodic features

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a non-uniform time scale modification (TSM) method is proposed for increasing or decreasing speech rate. The proposed method modifies the durations of vowel and pause segments by different modification factors. Vowel segments are modified by factors based on their identities, and pause segments by uniform factors based on the desired speaking rate. Consonant and transition (consonant-to-vowel) segments are not modified in the proposed TSM. These modification factors are derived from the analysis of slow and fast speech collected from professional radio artists. In the proposed TSM method, vowel onset points (VOPs) are used to mark the consonant, transition and vowel regions, and instants of significant excitation (ISE) are used to perform TSM as required. The VOPs indicate the instants at which the onsets of vowels take place. The ISE, also known as epochs, indicate the instants of glottal closure during voiced speech, and some random excitations such as burst onset during non-voiced speech. In this work, VOPs are determined using multiple sources of evidence from excitation source, spectral peaks, modulation spectrum and uniformity in epoch intervals. The ISEs are determined using a zero-frequency filter method. The performance of the proposed non-uniform TSM scheme is compared with uniform and existing non-uniform TSM schemes using epoch and time domain pitch synchronous overlap and add (TD-PSOLA) methods.