Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition

Authors:
T. Hori;C. Hori;Y. Minami;A. Nakamura
Affiliations:
NTT Commun. Sci. Labs., NTT Corp., Kyoto;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 13

Sequential dependency analysis for online spontaneous speech processing

Speech Communication
Calculating Inverse Filters for Speech Dereverberation

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Integrating multi-level linguistic knowledge with a unified framework for Mandarin speech recognition

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Expansion of WFST-based dialog management for handling multiple ASR hypotheses

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
User-adaptive coordination of agent communicative behavior in spoken dialogue

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Large vocabulary speech recognition system: SPOJUS++

ROCOM'11/MUSP'11 Proceedings of the 11th WSEAS international conference on robotics, control and manufacturing technology, and 11th WSEAS international conference on Multimedia systems & signal processing
Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection

Speech Communication
Index-based incremental language model for scalable directory assistance

Speech Communication
Joint estimation of confidence and error causes in speech recognition

Speech Communication
Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

Computer Speech and Language
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

Computer Speech and Language
Prior-shared feature and model space speaker adaptation by consistently employing map estimation

Speech Communication
An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel one-pass search algorithm with on-the-fly composition of weighted finite-state transducers (WFSTs) for large-vocabulary continuous-speech recognition. In the standard search method with on-the-fly composition, two or more WFSTs are composed during decoding, and a Viterbi search is performed based on the composed search space. With this new method, a Viterbi search is performed based on the first of the two WFSTs. The second WFST is only used to rescore the hypotheses generated during the search. Since this rescoring is very efficient, the total amount of computation required by the new method is almost the same as when using only the first WFST. In a 65k-word vocabulary spontaneous lecture speech transcription task, our proposed method significantly outperformed the standard search method. Furthermore, our method was faster than decoding with a single fully composed and optimized WFST, where our method used only 38% of the memory required for decoding with the single WFST. Finally, we have achieved high-accuracy one-pass real-time speech recognition with an extremely large vocabulary of 1.8 million words