Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

Authors:
Goshu Nagino;Makoto Shozakai;Tomoki Toda;Hiroshi Saruwatari;Kiyohiro Shikano
Affiliations:
-;-;-;-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2008

Citing 3
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Nonlinear Mapping for Data Structure Analysis

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.