Video representation and retrieval using spatio-temporal descriptors and region relations

Authors:
Sotirios Chatzis;Anastasios Doulamis;Dimitrios Kosmopoulos;Theodora Varvarigou
Affiliations:
Department Athens, National Technical University of Athens, Electrical and Computer Engineering, Greece;Department Athens, National Technical University of Athens, Electrical and Computer Engineering, Greece;Demokritos National Centre for Scientific Research, Institute of Informatics and Telecommunications, Athens, Greece;Department Athens, National Technical University of Athens, Electrical and Computer Engineering, Greece
Venue:
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
Year:
2006

Citing 3
Cited 1

A Survey on Content-Based Retrieval for Multimedia Databases

IEEE Transactions on Knowledge and Data Engineering
Video retrieval using spatio-temporal descriptors

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
An efficient fully unsupervised video object segmentation scheme using an adaptive neural-network classifier architecture

IEEE Transactions on Neural Networks

Spatio-temporal descriptor using 3D curvature scale space

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel methodology for video summarization and representation. The video shots are processed in space-time as 3D volumes of pixels. Pixel regions with consistent color and motion properties are extracted from these 3D volumes by a space-time segmentation technique based on a novel machine learning algorithm. Each region is then described by a high-dimensional point whose components represent the average position, motion velocity and color of the region. Subsequently, the spatio-temporal relations of the regions are deduced and a concise, graph-based description of them is generated. This graph-based description of the video shot's content, along with the region centroids, comprises a concise yet powerful description of the video-shot and is used for retrieval applications. The retrieval problem is formulated as an inexact graph matching problem between the data video shots and the query input which is also a video segment. Experimental results on action recognition and video retrieval are illustrated and discussed.