Web derived pronunciations for spoken term detection

Authors:
Dogan Can;Erica Cooper;Arnab Ghoshal;Martin Jansche;Sanjeev Khudanpur;Bhuvana Ramabhadran;Michael Riley;Murat Saraclar;Abhinav Sethy;Morgan Ulinski;Christopher White
Affiliations:
Bogazici University, Istanbul, Turkey;MIT, Cambridge, MA, USA;Johns Hopkins University, Baltimore, MD, USA;Google, Inc., NY, NY, USA;Johns Hopkins University, Baltimore, MD, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;Google, Inc., New York, NY, USA;Bogazici University, Istanbul, Turkey;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;Cornell University, Ithaca, NY, USA;Johns Hopkins University, Baltimore, MD, USA
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 7
Cited 1

Inference of variable-length linguistic and acoustic units by multigrams

Speech Communication
Effects of out of vocabulary words in spoken document retrieval (poster session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning for Sequential Data: A Review

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
A multistrategy approach to improving pronunciation by analogy

Computational Linguistics
Vocabulary independent spoken term detection

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
General indexation of weighted automata: application to spoken utterance retrieval

SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004
OpenFst: a general and efficient weighted finite-state transducer library

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata

Web-based tools and methods for rapid pronunciation dictionary creation

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Indexing and retrieval of speech content in various forms such as broadcast news, customer care data and on-line media has gained a lot of interest for a wide range of applications, from customer analytics to on-line media search. For most retrieval applications, the speech content is typically first converted to a lexical or phonetic representation using automatic speech recognition (ASR). The first step in searching through indexes built on these representations is the generation of pronunciations for named entities and foreign language query terms. This paper summarizes the results of the work conducted during the 2008 JHU Summer Workshop by the Multilingual Spoken Term Detection team, on mining the web for pronunciations and analyzing their impact on spoken term detection. We will first present methods to use the vast amount of pronunciation information available on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate pronunciations from Web pages and associating them with orthographic words, filtering out poorly extracted pronunciations, normalizing IPA pronunciations to better conform to a common transcription standard, and generating phonemic representations from ad-hoc transcriptions. We then present an analysis of the effectiveness of using these pronunciations to represent Out-Of-Vocabulary (OOV) query terms on the performance of a spoken term detection (STD) system. We will provide comparisons of Web pronunciations against automated techniques for pronunciation generation as well as pronunciations generated by human experts. Our results cover a range of speech indexes based on lattices, confusion networks and one-best transcriptions at both word and word fragments levels.