Retrieving spoken documents by combining multiple index sources
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
DL '97 Proceedings of the second ACM international conference on Digital libraries
New techniques for open-vocabulary spoken document retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Forgetting Exceptions is Harmful in Language Learning
Machine Learning - Special issue on natural language learning
Phonetic confusion matrix based spoken document retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Effects of out of vocabulary words in spoken document retrieval (poster session)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Word-Based Confidence Measures As a Guide for Stack Search in Speech Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A Segment-Based Wordspotter Using Phonetic Filler Models
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A Probabilistic Approach to Confidence Estimation and Evaluation
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Neural - Network Based Measures of Confidence for Word Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Confidence Measures for Spontaneous Speech Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Subword-based approaches for spoken document retrieval
Subword-based approaches for spoken document retrieval
A phonotactic-semantic paradigm for automatic spoken document classification
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Confidence measures for the SWITCHBOARD database
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A phone-dependent confidence measure for utterance rejection
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A system for unrestricted topic retrieval from radio news broadcasts
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Robust talker-independent audio document retrieval
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Vocabulary independent spoken term detection
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Indexing confusion networks for morph-based spoken document retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Effect of pronounciations on OOV queries in spoken term detection
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Posterior-based confidence measures for spoken term detection
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Spoken term detection system based on combination of LVCSR and phonetic search
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Novel methods for query selection and query combination in query-by-example spoken term detection
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Direct posterior confidence for out-of-vocabulary spoken term detection
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
An efficient way to learn English grapheme-to-phoneme rules automatically
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
A two pass classifier for utterance rejection in keyword spotting
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
The AMI meeting transcription system: progress and performance
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection
IEEE Transactions on Audio, Speech, and Language Processing
Approaches to reduce the effects of OOV queries on indexed spoken audio
IEEE Transactions on Multimedia
Spoken Content Retrieval: A Survey of Techniques and Technologies
Foundations and Trends in Information Retrieval
Hi-index | 0.00 |
Spoken term detection (STD) is a key technology for spoken information retrieval. As compared to the conventional speech transcription and keyword spotting, STD is an open-vocabulary task and has to address out-of-vocabulary (OOV) terms. Approaches based on subword units, for example phones, are widely used to solve the OOV issue; however, performance on OOV terms is still substantially inferior to that of in-vocabulary (INV) terms. The performance degradation on OOV terms can be attributed to a multitude of factors. One particular factor we address in this article is the unreliable confidence estimation caused by weak acoustic and language modeling due to the absence of OOV terms in the training corpora. We propose a direct posterior confidence derived from a discriminative model, such as multilayer perceptron (MLP). The new confidence considers a wide-range acoustic context which is usually important for speech recognition and retrieval; moreover, it localizes on detected speech segments and therefore avoids the impact of long-span word context which is usually unreliable for OOV term detection. In this article, we first develop an extensive discussion about the modeling weakness problem associated with OOV terms, and then propose our approach to address this problem based on direct poster confidence. Our experiments carried out on spontaneous and conversational multiparty meeting speech, demonstrate that the proposed technique provides a significant improvement in STD performance as compared to conventional lattice-based confidence, in particular for OOV terms. Furthermore, the new confidence estimation approach is fused with other advanced techniques for OOV treatment, such as stochastic pronunciation modeling and discriminative confidence normalization. This leads to an integrated solution for OOV term detection that results in a large performance improvement.