Recent innovations in speech-to-text transcription at SRI-ICSI-UW

Authors:
A. Stolcke;Barry Chen;H. Franco;Venkata Ramana Rao Gadde;M. Graciarena;Mei-Yuh Hwang;K. Kirchhoff;A. Mandal;N. Morgan;Xin Lei;T. Ng;M. Ostendorf;K. Sonmez;A. Venkataraman;D. Vergyri;Wen Wang;Jing Zheng;Qifeng Zhu
Affiliations:
SRI Int., Menlo Park, CA;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 10

Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Speech Processing for Audio Indexing

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
A Comparison of Language Models for Dialog Act Segmentation of Meeting Transcripts

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Improving robustness of MLLR adaptation with speaker-clustered regression class trees

Computer Speech and Language
Anchored speech recognition for question answering

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Automatic Speech-to-Text Transcription in Arabic

ACM Transactions on Asian Language Information Processing (TALIP)
Using confusion networks for speech summarization

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
N-best rescoring based on pitch-accent patterns

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Joint reranking of parsing and word recognition with automatic segmentation

Computer Speech and Language
Level of interest sensing in spoken dialog using decision-level fusion of acoustic and lexical evidence

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling. In the front end, we experimented with nonstandard features, including various measures of voicing, discriminative phone posterior features estimated by multilayer perceptrons, and a novel phone-level macro-averaging for cepstral normalization. Acoustic modeling was improved with combinations of front ends operating at multiple frame rates, as well as by modifications to the standard methods for discriminative Gaussian estimation. We show that acoustic adaptation can be improved by predicting the optimal regression class complexity for a given speaker. Language modeling innovations include the use of a syntax-motivated almost-parsing language model, as well as principled vocabulary-selection techniques. Finally, we address portability issues, such as the use of imperfect training transcripts, and language-specific adjustments required for recognition of Arabic and Mandarin