Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features

Authors:
Sérgio Paulo;Luís C. Oliveira
Affiliations:
Spoken Language Systems Lab., INESC-ID, IST, Lisbon, Portugal;Spoken Language Systems Lab., INESC-ID, IST, Lisbon, Portugal
Venue:
PROPOR'03 Proceedings of the 6th international conference on Computational processing of the Portuguese language
Year:
2003

Citing 1
Cited 1

Spoken book alignment using WFSTs

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Telling stories with a synthetic character: understanding inter-modalities relations

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours

Quantified Score

Hi-index	0.00

Visualization

Abstract

The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.