Interaction techniques using prosodic features of speech and audio localization

Authors:
Alex Olwal;Steven Feiner
Affiliations:
Columbia University, New York, NY and Royal Institute of Technology, Stockholm, Sweden;Columbia University, New York, NY
Venue:
Proceedings of the 10th international conference on Intelligent user interfaces
Year:
2005

Citing 5
Cited 4

Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Responding to subtle, fleeting changes in the user's internal state

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Voice as sound: using non-verbal voice input for interactive control

Proceedings of the 14th annual ACM symposium on User interface software and technology
Speech-based cursor control

Proceedings of the fifth international ACM conference on Assistive technologies
Unit: modular development of distributed interaction techniques for highly interactive user interfaces

Proceedings of the 2nd international conference on Computer graphics and interactive techniques in Australasia and South East Asia

The vocal joystick: a voice-based human-computer interface for individuals with motor impairments

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Blui: low-cost localized blowable user interfaces

Proceedings of the 20th annual ACM symposium on User interface software and technology
Compensate the Speech Recognition Delays for Accurate Speech-Based Cursor Position Control

Proceedings of the 13th International Conference on Human-Computer Interaction. Part II: Novel Interaction Methods and Techniques
Human-centered visualization environments

Human-centered visualization environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe several approaches for using prosodic features of speech and audio localization to control interactive applications. This information can be applied to parameter control, as well as to speech disambiguation. We discuss how characteristics of spoken sentences can be exploited in the user interface; for example, by considering the speed with which a sentence is spoken and the presence of extraneous utterances. We also show how coarse audio localization can be used for low-fidelity gesture tracking, by inferring the speaker's head position.