New techniques for open-vocabulary spoken document retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese
Speech Communication - Special issue on accessing information in spoken audio
Experiments in spoken document retrieval using phoneme n-grams
Speech Communication - Special issue on accessing information in spoken audio
Vocabulary independent spoken term detection
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Effect of pronounciations on OOV queries in spoken term detection
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
General indexation of weighted automata: application to spoken utterance retrieval
SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004
Large vocabulary speech recognition system: SPOJUS++
ROCOM'11/MUSP'11 Proceedings of the 11th WSEAS international conference on robotics, control and manufacturing technology, and 11th WSEAS international conference on Multimedia systems & signal processing
Hi-index | 0.00 |
For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-recognition of spoken words. Consequently, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition. To solve the problem of OOV keywords, we use individual syllables as the sub-word unit in continuous speech recognition. To address OOV words, recognition errors, and high-speed retrieval, we propose a distant n-gram indexing/retrieval method that incorporates a distance metric in a syllable lattice. When applied to syllable sequences, our proposed method outperformed a conventional DTW method between syllable sequences and was about 100 times faster. The retrieval results show that we can detect OOV words in a database containing 44h of audio in less than 10msec per query with an F-measure of 0.54.