Video retrieval using spatio-temporal descriptors

Authors:
Daniel DeMenthon;David Doermann
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD
Venue:
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Year:
2003

Citing 17
Cited 21

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Computing spatiotemporal relations for dynamic perceptual organization

CVGIP: Image Understanding
Motion recovery for video content classification

ACM Transactions on Information Systems (TOIS) - Special issue on video information retrieval
Local Grayvalue Invariants for Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Real-Time Tracking of Moving Persons by Exploiting Spatio-Temporal Image Slices

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Bayesian Computer Vision System for Modeling Human Interactions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognition of Visual Activities and Interactions by Stochastic Parsing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering by Scale-Space Filtering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
VisualGREP: A Systematic Method to Compare and RetrieveVideo Sequences

Multimedia Tools and Applications
Query by Image and Video Content: The QBIC System

Computer
Mean Shift, Mode Seeking, and Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Global Motion Fourier Series Expansion for Video Indexing and Retrieval

VISUAL '00 Proceedings of the 4th International Conference on Advances in Visual Information Systems
Human Action Detection Using PNF Propagation of Temporal Constraints

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Video Structuring, Indexing and Retrieval Based on Global Motion Wavelet Coefficients

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
A Cubist Approach to Object Recognition

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Nonparametric motion characterization using causal probabilistic models for video indexing and retrieval

IEEE Transactions on Image Processing

Fast video matching with signature alignment

MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
Semantic video classification and feature subset selection under context and concept uncertainty

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Fast and robust short video clip search using an index structure

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Fast and robust video clip search using index structure

Proceedings of the 12th annual ACM international conference on Multimedia
Detection of video sequences using compact signatures

ACM Transactions on Information Systems (TOIS)
Toward a Common Event Model for Multimedia Applications

IEEE MultiMedia
Analysis of vector space model and spatiotemporal segmentation for video indexing and retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Scene duplicate detection from videos based on trajectories of feature points

Proceedings of the international workshop on Workshop on multimedia information retrieval
Scene duplicate detection based on the pattern of discontinuities in feature point trajectories

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Towards Fully Automatic Image Segmentation Evaluation

ACIVS '08 Proceedings of the 10th International Conference on Advanced Concepts for Intelligent Vision Systems
Temporal video analysis

AREA '08 Proceedings of the 1st ACM workshop on Analysis and retrieval of events/actions and workflows in video streams
Short-term audio-visual atoms for generic video concept classification

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Rank-test similarity measure between video segments for local descriptors

AMR'06 Proceedings of the 4th international conference on Adaptive multimedia retrieval: user, context, and feedback
Spatio-temporal descriptor using 3D curvature scale space

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Simplifying mixture models through function approximation

IEEE Transactions on Neural Networks
Audio-visual atoms for generic video concept classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Video representation and retrieval using spatio-temporal descriptors and region relations

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
Graph-Based spatio-temporal region extraction

ICIAR'06 Proceedings of the Third international conference on Image Analysis and Recognition - Volume Part I
Fast and robust short video clip search for copy detection

PCM'04 Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing - Volume Part II
Video clip matching using MPEG-7 descriptors and edit distance

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
Accelerated convergence using dynamic mean shift

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel methodology for implementing video search functions such as retrieval of near-duplicate videos and recognition of actions in surveillance video. Videos are divided into half-second clips whose stacked frames produce 3D space-time volumes of pixels. Pixel regions with consistent color and motion properties are extracted from these 3D volumes by a threshold-free hierarchical space-time segmentation technique. Each region is then described by a high-dimensional point whose components represent the position, motion and, when possible, color of the region. In the indexing phase for a video database, these points are assigned labels that specify their video clip of origin. All the labeled points for all the clips are stored into a single binary tree for efficient $k$-nearest neighbor retrieval. The retrieval phase uses video segments as queries. Half-second clips of these queries are again segmented to produce sets of points, and for each point the labels of its nearest neighbors are retrieved. The labels that receive the largest numbers of votes correspond to the database clips that are the most similar to the query video segment. We illustrate this approach for video indexing and retrieval and for action recognition. First, we describe retrieval experiments for dynamic logos, and for video queries that differ from the indexed broadcasts by the addition of large overlays. Then we describe experiments in which office actions (such as pulling and closing drawers, taking and storing items, picking up and putting down a phone) are recognized. Color information is ignored to insure independence to people's appearance. One of the distinct advantages of using this approach for action recognition is that there is no need for detection or recognition of body parts.