Video object matching across multiple independent views using local descriptors and adaptive learning

Authors:
Luis F. Teixeira;Luis Corte-Real
Affiliations:
INESC Porto, Faculdade de Engenharia, Universidade do Porto, Campus da FEUP, Rua Dr. Roberto Frias, No. 378, 4200-465 Porto, Portugal;INESC Porto, Faculdade de Engenharia, Universidade do Porto, Campus da FEUP, Rua Dr. Roberto Frias, No. 378, 4200-465 Porto, Portugal
Venue:
Pattern Recognition Letters
Year:
2009

Citing 25
Cited 7

Tracking Human Motion in Structured Environments Using a Distributed-Camera System

IEEE Transactions on Pattern Analysis and Machine Intelligence
Monitoring Activities from Multiple Video Streams: Establishing a Common Coordinate Frame

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene

International Journal of Computer Vision
Multi View Image Surveillance and Tracking

MOTION '02 Proceedings of the Workshop on Motion and Video Computing
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Tracking Across Multiple Cameras With Disjoint Views

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Real-Time Wide Area Multi-Camera Stereo Tracking

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Appearance Modeling for Tracking in Multiple Non-Overlapping Cameras

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Multi-Target Tracking - Linking Identities using Bayesian Network Inference

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Tracking people across disjoint camera views by an illumination-tolerant appearance representation

Machine Vision and Applications
Incremental Learning for Robust Visual Tracking

International Journal of Computer Vision
Real-time tracking with classifiers

WDV'05/WDV'06/ICCV'05/ECCV'06 Proceedings of the 2005/2006 international conference on Dynamical vision
Sampling strategies for bag-of-features image classification

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
The 2005 PASCAL visual object classes challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Learn++: an incremental learning algorithm for supervised neuralnetworks

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A survey on visual surveillance of object motion and behaviors

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Consistent labeling of tracked objects in multiple cameras with overlapping fields of view

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Thousand Words in a Scene

IEEE Transactions on Pattern Analysis and Machine Intelligence

MarsyasX: multimedia dataflow processing with implicit patching

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Object Matching in Distributed Video Surveillance Systems by LDA-Based Appearance Descriptors

ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Re-identification of visual targets in camera networks: a comparison of techniques

ICIAR'11 Proceedings of the 8th international conference on Image analysis and recognition - Volume Part I
Entropy-based localization of textured regions

ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing: Part I
Boosted human re-identification using Riemannian manifolds

Image and Vision Computing
Person re-identification in crowd

Pattern Recognition Letters
People reidentification in surveillance and forensics: A survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.10

Visualization

Abstract

Object detection and tracking is an essential preliminary task in event analysis systems (e.g. visual surveillance). Typically objects are extracted and tagged, forming representative tracks of their activity. Tagging is usually performed by probabilistic data association, however, in systems capturing disjoint areas it is often not possible to establish such associations, as data may have been collected at different times or in different locations. In this case, appearance matching is a valuable aid. We propose using bag-of-visterms, i.e. an histogram of quantized local feature descriptors, to represent and match tracked objects. This method has proven to be effective for object matching and classification in image retrieval applications, where descriptors can be extracted a priori. An important difference in event analysis systems is that relevant information is typically restricted to the foreground. Descriptors can, therefore, be extracted faster, approaching real-time requirements. Also, unlike image retrieval, objects can change over time and therefore their model needs to be updated continuously. Incremental or adaptive learning is used to tackle this problem. Using independent tracks of 30 different persons, we show that the bag-of-visterms representation effectively discriminates visual object tracks and that it presents high resilience to incorrect object segmentation. Additionally, this methodology allows the construction of scalable object models that can be used to match tracks across independent views.