A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

  • Authors:
  • Seiichi Nakagawa;Keisuke Iwami;Yasuhisa Fujii;Kazumasa Yamamoto

  • Affiliations:
  • Department of Computer Science and Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Computer Science and Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Computer Science and Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Computer Science and Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-recognition of spoken words. Consequently, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition. To solve the problem of OOV keywords, we use individual syllables as the sub-word unit in continuous speech recognition. To address OOV words, recognition errors, and high-speed retrieval, we propose a distant n-gram indexing/retrieval method that incorporates a distance metric in a syllable lattice. When applied to syllable sequences, our proposed method outperformed a conventional DTW method between syllable sequences and was about 100 times faster. The retrieval results show that we can detect OOV words in a database containing 44h of audio in less than 10msec per query with an F-measure of 0.54.