Application of beamforming in wireless location estimation
EURASIP Journal on Applied Signal Processing
Sided and symmetrized Bregman centroids
IEEE Transactions on Information Theory
On the detection of discontinuities in concatenative speech synthesis
Progress in nonlinear speech processing
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
The effects of windowing on the calculation of MFCCs for different types of speech sounds
NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Application of Genetic Algorithm in unit selection for Malay speech synthesis system
Expert Systems with Applications: An International Journal
Diphones vs triphones in czech unit selection TTS
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Variable bit quantization for virtual source location information in spatial audio coding
PCM'05 Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part I
A multi-channel audio compression method with virtual source location information
PCM'05 Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part I
Nonlinear Speech Modeling and Applications
Syllable Specific Unit Selection Cost Functions for Text-to-Speech Synthesis
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.06 |
Concatenative speech synthesis systems attempt to minimize audible signal discontinuities between two successive concatenated units. An objective distance measure which is able to predict audible discontinuities is therefore very important, particularly in unit selection synthesis, for which units are selected from among a large inventory at run time. In this paper, we describe a perceptual test to measure the detection rate of concatenation discontinuity by humans, and then we evaluate 13 different objective distance measures based on their ability to predict the human results. Criteria used to classify these distances include the detection rate, the Bhattacharyya measure of separability of two distributions, and receiver operating characteristic (ROC) curves. Results show that the Kullback-Leibler distance on power spectra has the higher detection rate followed by the Euclidean distance on Mel-frequency cepstral coefficients (MFCC).