A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

Authors:
Seiichi Nakagawa;Keisuke Iwami;Yasuhisa Fujii;Kazumasa Yamamoto
Affiliations:
Department of Computer Science and Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Computer Science and Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Computer Science and Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Computer Science and Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan
Venue:
Speech Communication
Year:
2013

Citing 8
Cited 0

New techniques for open-vocabulary spoken document retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese

Speech Communication - Special issue on accessing information in spoken audio
Experiments in spoken document retrieval using phoneme n-grams

Speech Communication - Special issue on accessing information in spoken audio
Vocabulary independent spoken term detection

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Japanese spoken document retrieval considering OOV keywords using LVCSR system with OOV detection processing

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Effect of pronounciations on OOV queries in spoken term detection

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
General indexation of weighted automata: application to spoken utterance retrieval

SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004
Large vocabulary speech recognition system: SPOJUS++

ROCOM'11/MUSP'11 Proceedings of the 11th WSEAS international conference on robotics, control and manufacturing technology, and 11th WSEAS international conference on Multimedia systems & signal processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-recognition of spoken words. Consequently, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition. To solve the problem of OOV keywords, we use individual syllables as the sub-word unit in continuous speech recognition. To address OOV words, recognition errors, and high-speed retrieval, we propose a distant n-gram indexing/retrieval method that incorporates a distance metric in a syllable lattice. When applied to syllable sequences, our proposed method outperformed a conventional DTW method between syllable sequences and was about 100 times faster. The retrieval results show that we can detect OOV words in a database containing 44h of audio in less than 10msec per query with an F-measure of 0.54.