Invited paper: Automatic speech recognition: History, methods and challenges
Pattern Recognition
Speech Processing for Audio Indexing
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
A Comparison of Language Models for Dialog Act Segmentation of Meeting Transcripts
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Improving robustness of MLLR adaptation with speaker-clustered regression class trees
Computer Speech and Language
Anchored speech recognition for question answering
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Automatic Speech-to-Text Transcription in Arabic
ACM Transactions on Asian Language Information Processing (TALIP)
Using confusion networks for speech summarization
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
N-best rescoring based on pitch-accent patterns
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Joint reranking of parsing and word recognition with automatic segmentation
Computer Speech and Language
Computer Speech and Language
Hi-index | 0.00 |
We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling. In the front end, we experimented with nonstandard features, including various measures of voicing, discriminative phone posterior features estimated by multilayer perceptrons, and a novel phone-level macro-averaging for cepstral normalization. Acoustic modeling was improved with combinations of front ends operating at multiple frame rates, as well as by modifications to the standard methods for discriminative Gaussian estimation. We show that acoustic adaptation can be improved by predicting the optimal regression class complexity for a given speaker. Language modeling innovations include the use of a syntax-motivated almost-parsing language model, as well as principled vocabulary-selection techniques. Finally, we address portability issues, such as the use of imperfect training transcripts, and language-specific adjustments required for recognition of Arabic and Mandarin