Effect of pronounciations on OOV queries in spoken term detection

Authors:
Dogan Can;Erica Cooper;Abhinav Sethy;Chris White;Bhuvana Ramabhadran;Murat Saraclar
Affiliations:
Bogazici University, Turkey;Massachusetts Institute of Technology, USA;IBM, USA;Johns Hopkins University, USA;IBM, USA;Bogazici University, Turkey
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 8

Score distribution based term specific thresholding for spoken term detection

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Contextual information improves OOV detection in speech

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Novel methods for query selection and query combination in query-by-example spoken term detection

Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Direct posterior confidence for out-of-vocabulary spoken term detection

Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Learning sub-word units for open vocabulary speech recognition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Direct posterior confidence for out-of-vocabulary spoken term detection

ACM Transactions on Information Systems (TOIS)
Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

ACM Transactions on Information Systems (TOIS)
A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The spoken term detection (STD) task aims to return relevant segments from a spoken archive that contain the query terms whether or not they are in the system vocabulary. This paper focuses on pronunciation modeling for Out-of-Vocabulary (OOV) terms which frequently occur in STD queries. The STD system described in this paper indexes word-level and sub-word level lattices or confusion networks produced by an LVCSR system using Weighted Finite State Transducers (WFST).We investigate the inclusion of n-best pronunciation variants for OOV terms (obtained from letter-to-sound rules) into the search and present the results obtained by indexing confusion networks as well as lattices. The following observations are worth mentioning: phone indexes generated from sub-words represent OOVs well and too many variants for the OOV terms degrade performance if pronunciations are not weighted.