A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Machine Learning
Learning a Classification Model for Segmentation
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Common sense based joint training of human activity recognizers
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Real-time upper body detection and 3d pose estimation in monoscopic images
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part III
Real-time human pose recognition in parts from single depth images
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Mining actionlet ensemble for action recognition with depth cameras
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Hi-index | 0.00 |
We propose a framework, consisting of several algorithms to recognize human activities that involve manipulating objects. Our proposed algorithm identifies objects being manipulated and models high-level tasks being performed accordingly. Realistic settings for such tasks pose several problems for computer vision, including sporadic occlusion by subjects, non-frontal poses, and objects with few local features. We show how size and segmentation information derived from depth data can address these challenges using simple and fast techniques. In particular, we show how to robustly and without supervision find the manipulating hand, properly detect/recognize objects and properly use the temporal information to fill in the gaps between sporadically detected objects, all through careful inclusion of depth cues. We evaluate our approach on a challenging dataset of 12 kitchen tasks that involve 24 objects performed by 2 subjects. The entire framework yields 82%/84% precision (74%/83%recall) for task/object recognition. Our techniques outperform the state-of-the-art significantly in activity/object recognition.