Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features

  • Authors:
  • Sérgio Paulo;Luís C. Oliveira

  • Affiliations:
  • Spoken Language Systems Lab., INESC-ID, IST, Lisbon, Portugal;Spoken Language Systems Lab., INESC-ID, IST, Lisbon, Portugal

  • Venue:
  • PROPOR'03 Proceedings of the 6th international conference on Computational processing of the Portuguese language
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.