Unit selection using k-nearest neighbor search for concatenative speech synthesis

  • Authors:
  • Hideyuki Mizuno;Satoshi Takahashi

  • Affiliations:
  • NTT Cyber Space Laboratories, Yokosuka-Shi, Kanagawa, Japan;NTT Cyber Space Laboratories, Yokosuka-Shi, Kanagawa, Japan

  • Venue:
  • Proceedings of the 3rd International Universal Communication Symposium
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new approach to rapidly identifying adequate synthesis units in extremely large speech corpora. Our aim is to develop a concatenative speech synthesis system with high performance (both speech quality and throughput) for various practical applications. Utilizing very large speech corpora allows more natural sounding synthesized speech to be created; the downside is an increase in the time taken to locate the synthesis units needed. The key to overcoming this problem is introducing state-of-the art database retrieval technologies. The first selection step, based on simple hash search, tabulates all synthesis unit candidates. The second step selects N best candidates using nearest neighbor search, a typical database retrieval technique. Finally, the best sequence of synthesis units is determined by Viterbi search. A runtime measurement test and subjective experiment are carried out. Their results confirm that the proposed approach reduces the runtime by about 40% compared to using only hash search with no degradation in the quality of synthesized speech for a 15 hour corpus.