How to wreck a nice beach you sing calm incense

Authors:
Henry Lieberman;Alexander Faaborg;Waseem Daher;José Espinosa
Affiliations:
MIT Media Laboratory, Cambridge, MA;MIT Media Laboratory, Cambridge, MA;MIT Media Laboratory, Cambridge, MA;MIT Media Laboratory, Cambridge, MA
Venue:
Proceedings of the 10th international conference on Intelligent user interfaces
Year:
2005

Citing 6
Cited 12

A maximum likelihood approach to continuous speech recognition

Readings in speech recognition
Training and search methods for speech recognition

Voice communication between humans and machines
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A commonsense approach to predictive text entry

CHI '04 Extended Abstracts on Human Factors in Computing Systems
Speech recognition and the frequency of recently used words: a modified Markov model for natural language

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
ConceptNet — A Practical Commonsense Reasoning Tool-Kit

BT Technology Journal

A goal-oriented web browser

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Towards a taxonomy of error-handling strategies in recognition-based multi-modal human-computer interfaces

Signal Processing - Special section: Multimodal human-computer interfaces
Enhancing accessibility through correction of speech recognition errors

ACM SIGACCESS Accessibility and Computing - ASSETS 2007 doctoral consortium
An interface for targeted collection of common sense knowledge using a mixture model

Proceedings of the 14th international conference on Intelligent user interfaces
Understanding users' perception of speech recognition errors in mobile communication

International Journal of Mobile Learning and Organisation
AnalogySpace: reducing the dimensionality of common sense knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
An empirical study on users' acceptance of speech recognition errors in text-messaging

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
No Code Required: Giving Users Tools to Transform the Web

No Code Required: Giving Users Tools to Transform the Web
Third-party error detection support mechanisms for dictation speech recognition

Interacting with Computers
Autobiographical design in HCI research: designing and learning through use-it-yourself

Proceedings of the Designing Interactive Systems Conference
Applying Commonsense Reasoning to Place Identification

International Journal of Handheld Computing Research
Information distance between what I said and what it heard

Communications of the ACM

Quantified Score

Hi-index	0.02

Visualization

Abstract

A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the "best" candidate. If the choice is incorrect, the user must invoke a correction interface that displays a list of the hypotheses and choose the desired one. The correction interface is time-consuming, and accounts for much of the frustration of today's dictation systems. Conventional dictation systems prioritize hypotheses based on language models derived from statistical techniques such as n-grams and Hidden Markov Models.We propose a supplementary method for ordering hypotheses based on Commonsense Knowledge. We filter acoustical and word-frequency hypotheses by testing their plausibility with a semantic network derived from 700,000 statements about everyday life. This often filters out possibilities that "don't make sense" from the user's viewpoint, and leads to improved recognition. Reducing the hypothesis space in this way also makes possible streamlined correction interfaces that improve the overall throughput of dictation systems.