Architecture, User Interface, and Enabling Technology in Windows Vista's Speech Systems

Authors:
Julian Odell;Kunal Mukerjee
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
2007

Citing 8
Cited 1

Speech recognition by machines and humans

Speech Communication
Developing International Software

Developing International Software
Modern Control Systems

Modern Control Systems
Tablet PC Quick Reference

Tablet PC Quick Reference
The SPHINX-II Speech Recognition System: An Overview

The SPHINX-II Speech Recognition System: An Overview
Challenges in adopting speech recognition

Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
The 1998 HTK system for transcription of conversational telephone speech

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Progress in the CU-HTK broadcast news transcription system

IEEE Transactions on Audio, Speech, and Language Processing

Using speech recognition technology in the classroom: an experiment in computer-supported collaborative learning

CSCL'09 Proceedings of the 9th international conference on Computer supported collaborative learning - Volume 2

Quantified Score

Hi-index	14.98

Visualization

Abstract

Existing speech recognition systems have claimed high accuracy for specific tasks such as dictation. What is new in Windows Speech recognition for Vista is a combination of high accuracy and high usability for the end-to-end speech experience. This paper describes the architecture, user interface and key technologies that make up the speech system incorporated in Microsoft Windows Vista. It outlines some of the challenges encountered in providing a speech-based interface to a system as complex and extensible as the modern desktop PC, as well as the technology developments that have made this possible. In particular, the paper describes key elements of the speech user interface and how they maintain the user's ability to control the system despite limitations in the underlying recognition technology. The paper also explains how feedback and adaptation systems are used to tailor the experience to each user and their particular style of speaking/use of language.