Direct posterior confidence for out-of-vocabulary spoken term detection

Authors:
Dong Wang;Simon King;Joe Frankel;Ravichander Vipperla;Nicholas Evans;Raphaël Troncy
Affiliations:
Nuance Communications, Aachen, Germany;University of Edinburgh, Edinburgh, UK;University of Edinburgh, Edinburgh, UK;EURECOM, France;EURECOM, France;EURECOM, France
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2012

Citing 36
Cited 1

Retrieving spoken documents by combining multiple index sources

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents

DL '97 Proceedings of the second ACM international conference on Digital libraries
New techniques for open-vocabulary spoken document retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Phonetic confusion matrix based spoken document retrieval

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Effects of out of vocabulary words in spoken document retrieval (poster session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Word-Based Confidence Measures As a Guide for Stack Search in Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A Segment-Based Wordspotter Using Phonetic Filler Models

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A Probabilistic Approach to Confidence Estimation and Evaluation

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Neural - Network Based Measures of Confidence for Word Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Confidence Measures for Spontaneous Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Subword-based approaches for spoken document retrieval

Subword-based approaches for spoken document retrieval
A phonotactic-semantic paradigm for automatic spoken document classification

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Rejection of extraneous input in speech recognition applications, using multi-layer perceptrons and the trace of HMMs

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Confidence measures for the SWITCHBOARD database

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A phone-dependent confidence measure for utterance rejection

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A system for unrestricted topic retrieval from radio news broadcasts

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Robust talker-independent audio document retrieval

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Vocabulary independent spoken term detection

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Indexing confusion networks for morph-based spoken document retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Effect of pronounciations on OOV queries in spoken term detection

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Posterior-based confidence measures for spoken term detection

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Spoken term detection system based on combination of LVCSR and phonetic search

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Novel methods for query selection and query combination in query-by-example spoken term detection

Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Direct posterior confidence for out-of-vocabulary spoken term detection

Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
An efficient way to learn English grapheme-to-phoneme rules automatically

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
A two pass classifier for utterance rejection in keyword spotting

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
The AMI meeting transcription system: progress and performance

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection

IEEE Transactions on Audio, Speech, and Language Processing
Approaches to reduce the effects of OOV queries on indexed spoken audio

IEEE Transactions on Multimedia

Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spoken term detection (STD) is a key technology for spoken information retrieval. As compared to the conventional speech transcription and keyword spotting, STD is an open-vocabulary task and has to address out-of-vocabulary (OOV) terms. Approaches based on subword units, for example phones, are widely used to solve the OOV issue; however, performance on OOV terms is still substantially inferior to that of in-vocabulary (INV) terms. The performance degradation on OOV terms can be attributed to a multitude of factors. One particular factor we address in this article is the unreliable confidence estimation caused by weak acoustic and language modeling due to the absence of OOV terms in the training corpora. We propose a direct posterior confidence derived from a discriminative model, such as multilayer perceptron (MLP). The new confidence considers a wide-range acoustic context which is usually important for speech recognition and retrieval; moreover, it localizes on detected speech segments and therefore avoids the impact of long-span word context which is usually unreliable for OOV term detection. In this article, we first develop an extensive discussion about the modeling weakness problem associated with OOV terms, and then propose our approach to address this problem based on direct poster confidence. Our experiments carried out on spontaneous and conversational multiparty meeting speech, demonstrate that the proposed technique provides a significant improvement in STD performance as compared to conventional lattice-based confidence, in particular for OOV terms. Furthermore, the new confidence estimation approach is fused with other advanced techniques for OOV treatment, such as stochastic pronunciation modeling and discriminative confidence normalization. This leads to an integrated solution for OOV term detection that results in a large performance improvement.