Rethinking of computation for future-generation, knowledge-rich speech recognition and understanding

Authors:
Li Deng
Affiliations:
Microsoft Research, Redmond, WA
Venue:
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Year:
2009

Citing 8
Cited 0

Statistical methods for speech recognition

Statistical methods for speech recognition
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Challenges in adopting speech recognition

Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
Speech recognition on vector architectures

Speech recognition on vector architectures
Directed decision trees for generating complementary systems

Speech Communication
Data sampling based ensemble acoustic modelling

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An Integrative and Discriminative Technique for Spoken Utterance Classification

IEEE Transactions on Audio, Speech, and Language Processing
Structured speech modeling

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new trend is emerging in the semiconductor industry that future computation speedups will likely come more from parallelism than from having faster individual computing elements. Most algorithm designers for the current, HMM-based speech recognition systems, which have the recognition performance significantly lower than that of human, have not embraced this trend. This is partly attributed to the state-of-the-art sequential algorithms that have involved extremely clever schemes to speed up single-processor performance developed and matured over many years. This invited presentation advances two arguments. First, much more powerful speech systems in the future generations will likely approach human performance with new architectures that integrate rich knowledge sources and overcome the reasonably well understood limitations of the current HMM-based systems. Second, the success of the above endeavor will require complete rethinking of computation issues, likely disposing of the traditional thinking of HMM-centric sequential processing and embracing parallel computing in the new architectures mimicking key aspects of the human speech processing system. Four case studies are provided in this paper extracted from some recent influential work that may shape the foundation of this potentially active research area.