MST-CSS (Multi-Spectro-Temporal Curvature Scale Space), a Novel Spatio-Temporal Representation for Content-Based Video Retrieval

Authors:
A. Dyana;S. Das
Affiliations:
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Madras, Chennai, India;-
Venue:
IEEE Transactions on Circuits and Systems for Video Technology
Year:
2010

Citing 0
Cited 1

An Improved Hierarchical Dirichlet Process-Hidden Markov Model and Its Application to Trajectory Modeling and Retrieval

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel spatio-temporal descriptor to efficiently represent a video object for the purpose of content-based video retrieval. Features from spatial along with temporal information are integrated in a unified framework for the purpose of retrieval of similar video shots. A sequence of orthogonal processing, using a pair of 1-D multiscale and multispectral filters, on the space-time volume (STV) of a video object (VOB) produces a gradually evolving (smoother) surface. Zero-crossing contours (2-D) computed using the mean curvature on this evolving surface are stacked in layers to yield a hilly (3-D) surface, for a joint multispectro-temporal curvature scale space (MST-CSS) representation of the video object. Peaks and valleys (saddle points) are detected on the MST-CSS surface for feature representation and matching. Computation of the cost function for matching a query video shot with a model involves matching a pair of 3-D point sets, with their attributes (local curvature), and 3-D orientations of the finally smoothed STV surfaces. Experiments have been performed with simulated and real-world video shots using precision-recall metric for our performance study. The system is compared with a few state-of-the-art methods, which use shape and motion trajectory for VOB representation. Our unified approach has shown better performance than other approaches that use combined match-costs obtained with separate shape and motion trajectory representations and our previous work on a simple joint spatio-temporal descriptor (3-D-CSS).