Building Models of Animals from Video

Authors:
Deva Ramanan;David A. Forsyth;Kobus Barnard
Affiliations:
IEEE;IEEE;IEEE
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2006

Citing 27
Cited 6

Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
CONDENSATION—Conditional Density Propagation forVisual Tracking

International Journal of Computer Vision
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons

International Journal of Computer Vision
Probabilistic Methods for Finding People

International Journal of Computer Vision
Probabilistic Tracking with Exemplars in a Metric Space

International Journal of Computer Vision - Marr Prize Special Issue
A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Active Appearance Models

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Estimating Human Body Configurations Using Shape Context Matching

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Unsupervised Learning of Models for Recognition

ECCV '00 Proceedings of the 6th European Conference on Computer Vision-Part I
Pedestrian Detection from a Moving Vehicle

ECCV '00 Proceedings of the 6th European Conference on Computer Vision-Part II
Stochastic Tracking of 3D Human Figures Using 2D Image Motion

ECCV '00 Proceedings of the 6th European Conference on Computer Vision-Part II
Recognizing and Tracking Human Action

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Finding Deformable Shapes Using Loopy Belief Propagation

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Tracking People with Twists and Exponential Maps

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Finding faces in cluttered scenes using random labeled graph matching

ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Finding People by Sampling

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Meta-Analysis of Face Recognition Algorithms

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Affine-Invariant Local Descriptors and Neighborhood Statistics for Texture Recognition

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Using Temporal Coherence to Build Models of Animals

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
"GrabCut": interactive foreground extraction using iterated graph cuts

ACM SIGGRAPH 2004 Papers
Pictorial Structures for Object Recognition

International Journal of Computer Vision
Strike a Pose: Tracking People by Finding Stylized Poses

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Tracking people and recognizing their activities

Tracking people and recognizing their activities
Cue integration through discriminative accumulation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Volumetric Features for Video Event Detection

International Journal of Computer Vision
Improved human parsing with a full relational model

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Affine warp propagation for fast simultaneous modelling and tracking of articulated objects

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
A large margin framework for single camera offline tracking with hybrid cues

Computer Vision and Image Understanding
Unsupervised skeleton extraction and motion capture from 3D deformable matching

Neurocomputing
Weakly supervised learning of object segmentations from web-scale video

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I

Quantified Score

Hi-index	0.14

Visualization

Abstract

This paper argues that tracking, object detection, and model building are all similar activities. We describe a fully automatic system that builds 2D articulated models known as pictorial structures from videos of animals. The learned model can be used to detect the animal in the original video—in this sense, the system can be viewed as a generalized tracker (one that is capable of modeling objects while tracking them). The learned model can be matched to a visual library; here, the system can be viewed as a video recognition algorithm. The learned model can also be used to detect the animal in novel images—in this case, the system can be seen as a method for learning models for object recognition. We find that we can significantly improve the pictorial structures by augmenting them with a discriminative texture model learned from a texture library. We develop a novel texture descriptor that outperforms the state-of-the-art for animal textures. We demonstrate the entire system on real video sequences of three different animals. We show that we can automatically track and identify the given animal. We use the learned models to recognize animals from two data sets; images taken by professional photographers from the Corel collection, and assorted images from the Web returned by Google. We demonstrate quite good performance on both data sets. Comparing our results with simple baselines, we show that, for the Google set, we can detect, localize, and recover part articulations from a collection demonstrably hard for object recognition.