Controlling your TV with gestures

Authors:
Ming-yu Chen;Lily Mummert;Padmanabhan Pillai;Alexander Hauptmann;Rahul Sukthankar
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Intel Labs Pittsburgh, Pittsburgh, PA, USA;Intel Labs Pittsburgh, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Intel Labs Pittsburgh and Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the international conference on Multimedia information retrieval
Year:
2010

Citing 14
Cited 7

A theory of productivity in the creative process

IEEE Computer Graphics and Applications
The information visualizer, an information workspace

CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Human Motion Analysis: A Review

NAM '97 Proceedings of the 1997 IEEE Workshop on Motion of Non-Rigid and Articulated Objects (NAM '97)
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
SPC: a distributed, scalable platform for data mining

Proceedings of the 4th international workshop on Data mining standards, services and platforms
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Response time in man-computer conversational transactions

AFIPS '68 (Fall, part I) Proceedings of the December 9-11, 1968, fall joint computer conference, part I
SLIPstream: scalable low-latency interactive perception on streaming data

Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video
Exploiting multi-level parallelism for low-latency activity recognition in streaming video

MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems

Accessible Multimodal Media Center Application for Blind and Partially Sighted People
Incremental placement of interactive perception applications

Proceedings of the 20th international symposium on High performance distributed computing
Hand gesture for taking self portrait

HCII'11 Proceedings of the 14th international conference on Human-computer interaction: interaction techniques and environments - Volume Part II
Evaluating performance and acceptance of older adults using freehand gestures for TV menu control

Proceedings of the 10th European conference on Interactive tv and video
TV remote control using human hand motion based on optical flow system

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
On the relation of ordinary gestures to TV screens: general lessons for the design of collaborative interactive techniques

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A comparative study of user-defined handheld vs. freehand gestures for home entertainment environments

Journal of Ambient Intelligence and Smart Environments

Quantified Score

Hi-index	0.01

Visualization

Abstract

Vision-based user interfaces enable natural interaction modalities such as gestures. Such interfaces require computationally intensive video processing at low latency. We demonstrate an application that recognizes gestures to control TV operations. Accurate recognition is achieved by using a new descriptor called MoSIFT, which explicitly encodes optical flow with appearance features. MoSIFT is computationally expensive - a sequential implementation runs 100 times slower than real time. To reduce latency sufficiently for interaction, the application is implemented on a runtime system that exploits the parallelism inherent in video understanding applications.