Direct posterior confidence for out-of-vocabulary spoken term detection

Authors:
Dong Wang;Simon King;Nicholas Evans;Joe Frankel;Raphaél Troncy
Affiliations:
EURECOM, Sophia Antipolis, France;University of Edinburgh, Edinburgh, United Kingdom;EURECOM, Sophia Antipolis, France;University of Edinburgh, Edinburgh, United Kingdom;EURECOM, Sophia Antipolis, France
Venue:
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Year:
2010

Citing 3
Cited 1

Effect of pronounciations on OOV queries in spoken term detection

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Posterior-based confidence measures for spoken term detection

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
The AMI meeting transcription system: progress and performance

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Direct posterior confidence for out-of-vocabulary spoken term detection

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spoken term detection (STD) is a fundamental task in spoken information retrieval. Compared to conventional speech transcription and keyword spotting, STD is an open-vocabul-ary task and is necessarily required to address out-of-vocabulary (OOV) terms. Approaches based on subword units, e.g. phonemes, are widely used to solve the OOV issue; however, performance on OOV terms is still significantly inferior to that for in-vocabulary (INV) terms. The performance degradation on OOV terms can be attributed to a multitude of factors. A particular factor we address in this paper is that the acoustic and language models used for speech transcribing are highly vulnerable to OOV terms, which leads to unreliable confidence measures and error-prone detections. A direct posterior confidence measure that is derived from discriminative models has been proposed for STD. In this paper, we utilize this technique to tackle the weakness of OOV terms in confidence estimation. Neither acoustic models nor language models being included in the computation, the new confidence avoids the weak modeling problem with OOV terms. Our experiments, set up on multi-party meeting speech which is highly spontaneous and conversational, demonstrate that the proposed technique improves STD performance on OOV terms significantly; when combined with conventional lattice-based confidence, a significant improvement in performance is obtained on both INVs and OOVs. Furthermore, the new confidence measure technique can be combined together with other advanced techniques for OOV treatment, such as stochastic pronunciation modeling and term-dependent confidence discrimination, which leads to an integrated solution for OOV STD with greatly improved performance.