Detecting activities of daily living in first-person camera views

Authors:
Deva Ramanan
Affiliations:
Department of Computer Science, University of California, Irvine
Venue:
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Year:
2012

Citing 0
Cited 8

Detecting eye contact using wearable eye-tracking glasses

Proceedings of the 2012 ACM Conference on Ubiquitous Computing
Learning to recognize daily actions using gaze

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Efficiently Scaling up Crowdsourced Video Annotation

International Journal of Computer Vision
A spatio-temporal pyramid matching for video retrieval

Computer Vision and Image Understanding
Modeling instrumental activities of daily living in egocentric vision as sequences of active objects and context for alzheimer disease research

Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare
Human action recognition by fast dense trajectories

Proceedings of the 21st ACM international conference on Multimedia
Analyzing growing plants from 4D point cloud data

ACM Transactions on Graphics (TOG)
Effective 3D action recognition using EigenJoints

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel dataset and novel algorithms for the problem of detecting activities of daily living (ADL) in firstperson camera views. We have collected a dataset of 1 million frames of dozens of people performing unscripted, everyday activities. The dataset is annotated with activities, object tracks, hand positions, and interaction events. ADLs differ from typical actions in that they can involve long-scale temporal structure (making tea can take a few minutes) and complex object interactions (a fridge looks different when its door is open). We develop novel representations including (1) temporal pyramids, which generalize the well-known spatial pyramid to approximate temporal correspondence when scoring a model and (2) composite object models that exploit the fact that objects look different when being interacted with. We perform an extensive empirical evaluation and demonstrate that our novel representations produce a two-fold improvement over traditional approaches. Our analysis suggests that real-world ADL recognition is “all about the objects,” and in particular, “all about the objects being interacted with.”