An assistive bi-modal user interface integrating multi-channel speech recognition and computer vision

Authors:
Alexey Karpov;Andrey Ronzhin;Irina Kipyatkova
Affiliations:
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russian Federation;St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russian Federation;St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russian Federation
Venue:
HCII'11 Proceedings of the 14th international conference on Human-computer interaction: interaction techniques and environments - Volume Part II
Year:
2011

Citing 11
Cited 1

Head pointing and speech control as a hands-free interface to desktop computing

Assets '98 Proceedings of the third international ACM conference on Assistive technologies
Dasher—a data entry interface using continuous gestures and language models

UIST '00 Proceedings of the 13th annual ACM symposium on User interface software and technology
Multimodal interfaces

The human-computer interaction handbook
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Towards a standard for pointing device evaluation, perspectives on 27 years of Fitts' law research in HCI

International Journal of Human-Computer Studies - Special issue: Fitts law 50 years later: Applications and contributions from human-computer interaction
The vocal joystick:: evaluation of voice-based cursor control techniques

Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility
An iterative image registration technique with an application to stereo vision

IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2
Evaluation of contactless multimodal pointing devices

IASTED-HCI '07 Proceedings of the Second IASTED International Conference on Human Computer Interaction
Evaluating eye tracking with ISO 9241 - part 9

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Client and speech detection system for intelligent infokiosk

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
FlowMouse: a computer vision-based pointing and gesture input device

INTERACT'05 Proceedings of the 2005 IFIP TC13 international conference on Human-Computer Interaction

About neural-network algorithms application in viseme classification problem with face video in audiovisual speech recognition systems

Optical Memory and Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a bi-modal user interface aimed both for assistance to persons without hands or with physical disabilities of hands/arms, and for contactless HCI with able-bodied users as well. Human being can manipulate a virtual mouse pointer moving his/her head and verbally communicate with a computer, giving speech commands instead of computer input devices. Speech is a very useful modality to reference objects and actions on objects, whereas head pointing gesture/motion is a powerful modality to indicate spatial locations. The bi-modal interface integrates a tri-lingual system for multi-channel audio signal processing and automatic recognition of voice commands in English, French and Russian as well as a vision-based head detection/tracking system. It processes natural speech and head pointing movements in parallel and fuses both informational streams in a united multimodal command, where each modality transmits own semantic information: head position indicates 2D head/pointer coordinates, while speech signal yields control commands. Testing of the bi-modal user interface and comparison with contact-based pointing interfaces was made by the methodology of ISO 9241-9.