Improving command and control speech recognition on mobile devices: using predictive user models for language modeling

Authors:
Tim Paek;David Maxwell Chickering
Affiliations:
Microsoft Research, Redmond, USA;Microsoft Research, Redmond, USA
Venue:
User Modeling and User-Adapted Interaction
Year:
2007

Citing 0
Cited 10

Disambiguating speech commands using physical context

Proceedings of the 9th international conference on Multimodal interfaces
Being Old Doesn’t Mean Acting Old: How Older Users Interact with Spoken Dialog Systems

ACM Transactions on Accessible Computing (TACCESS)
Handling out-of-grammar commands in mobile speech interaction using backoff filler models

SLP '07 Proceedings of the Workshop on Grammar-Based Approaches to Spoken Language Processing
E-learning systems with artificial intelligence in engineering

ICIC'09 Proceedings of the 5th international conference on Emerging intelligent computing technology and applications
Intelligent e-learning systems for evaluation of user's knowledge and skills with efficient information processing

ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
Layered evaluation of interactive adaptive systems: framework and formative methods

User Modeling and User-Adapted Interaction
Industrially oriented voice control system

Robotics and Computer-Integrated Manufacturing
Emerging Input Technologies for Always-Available Mobile Interaction

Foundations and Trends in Human-Computer Interaction
Hype or Ready for Prime Time?: Speech Recognition on Mobile Handheld Devices MASR

International Journal of Handheld Computing Research
Web-based remote voice control of robotized cells

Robotics and Computer-Integrated Manufacturing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via online updating of model parameters based on individual user data. Personalization significantly increased relative reduction in error rate by an additional 5%.