An approach for efficient open vocabulary spoken term detection

Authors:
Atta Norouzian;Richard Rose
Affiliations:
-;-
Venue:
Speech Communication
Year:
2014

Citing 7
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Spoken document retrieval from call-center conversations

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Position specific posterior lattices for indexing speech

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Joint-sequence models for grapheme-to-phoneme conversion

Speech Communication
General indexation of weighted automata: application to spoken utterance retrieval

SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004
Spoken term detection system based on combination of LVCSR and phonetic search

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Lattice Indexing for Spoken Term Detection

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A hybrid two-pass approach for facilitating fast and efficient open vocabulary spoken term detection (STD) is presented in this paper. A large vocabulary continuous speech recognition (LVCSR) system is deployed for producing word lattices from audio recordings. An index construction technique is used for facilitating very fast search of lattices for finding occurrences of both in vocabulary (IV) and out of vocabulary (OOV) query terms. Efficient search for query terms is performed in two passes. In the first pass, a subword approach is used for identifying audio segments that are likely to contain occurrences of the IV and OOV query terms from the index. A more detailed subword based search is performed in the second pass for verifying the occurrence of the query terms in the candidate segments. The performance of this STD system is evaluated in an open vocabulary STD task defined on a lecture domain corpus. It is shown that the indexing method presented here results in an index that is nearly two orders of magnitude smaller than the LVCSR lattices while preserving most of the information relevant for STD. Furthermore, despite using word lattices for constructing the index, 67% of the segments containing occurrences of the OOV query terms are identified from the index in the first pass. Finally, it is shown that the detection performance of the subword based term detection performed in the second pass has the effect of reducing the performance gap between OOV and IV query terms.