Seeing the Objects Behind the Dots: Recognition in Videos from a Moving Camera

Authors:
Björn Ommer;Theodor Mader;Joachim M. Buhmann
Affiliations:
Department of EECS, University of California, Berkeley, USA;Department of Computer Science, ETH Zurich, Zurich, Switzerland;Department of Computer Science, ETH Zurich, Zurich, Switzerland
Venue:
International Journal of Computer Vision
Year:
2009

Citing 28
Cited 3

Computing occluding and transparent motions

International Journal of Computer Vision
Recognizing 3-D Objects with Linear Support Vector Machines

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Kernel-Based Object Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiple View Geometry in Computer Vision

Multiple View Geometry in Computer Vision
Detecting Pedestrians Using Patterns of Motion and Appearance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Pictorial Structures for Object Recognition

International Journal of Computer Vision
Ensemble Tracking

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Optical Flow Estimation and Segmentation of Multiple Moving Dynamic Textures

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Randomized Trees for Real-Time Keypoint Recognition

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Context-Based Segmentation of Image Sequences

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Level Grouping for Video Shots

International Journal of Computer Vision
Unsupervised Bayesian Detection of Independent Motion in Crowds

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Context and Hierarchy in a Probabilistic Image Model

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Moving Object Segmentation using Scene Understanding

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Learning Layered Motion Segmentations of Video

International Journal of Computer Vision
Compositional object recognition, segmentation, and tracking in video

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Generalized principal component analysis (GPCA)

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Cross-Articulation learning for robust detection of pedestrians

DAGM'06 Proceedings of the 28th conference on Pattern Recognition
A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Learning compositional categorization models

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part III
Human detection using oriented histograms of flow and appearance

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II

Enhanced Local Subspace Affinity for feature-based motion segmentation

Pattern Recognition
Weakly supervised learning of object segmentations from web-scale video

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Human activity modeling by spatio temporal textural appearance

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Category-level object recognition, segmentation, and tracking in videos becomes highly challenging when applied to sequences from a hand-held camera that features extensive motion and zooming. An additional challenge is then to develop a fully automatic video analysis system that works without manual initialization of a tracker or other human intervention, both during training and during recognition, despite background clutter and other distracting objects. Moreover, our working hypothesis states that category-level recognition is possible based only on an erratic, flickering pattern of interest point locations without extracting additional features. Compositions of these points are then tracked individually by estimating a parametric motion model. Groups of compositions segment a video frame into the various objects that are present and into background clutter. Objects can then be recognized and tracked based on the motion of their compositions and on the shape they form. Finally, the combination of this flow-based representation with an appearance-based one is investigated. Besides evaluating the approach on a challenging video categorization database with significant camera motion and clutter, we also demonstrate that it generalizes to action recognition in a natural way.