Concept detection and keyframe extraction using a visual thesaurus

Authors:
Evaggelos Spyrou;Giorgos Tolias;Phivos Mylonas;Yannis Avrithis
Affiliations:
School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece 157 73;School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece 157 73;School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece 157 73;School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece 157 73
Venue:
Multimedia Tools and Applications
Year:
2009

Citing 20
Cited 2

Fuzzy sets and fuzzy logic: theory and applications

Fuzzy sets and fuzzy logic: theory and applications
The nature of statistical learning theory

The nature of statistical learning theory
A stochastic framework for optimal key frame extraction from MPEG video databases

Computer Vision and Image Understanding - Special issue on content-based access for image and video libraries
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
An Introduction to Genetic Algorithms

An Introduction to Genetic Algorithms
A user attention model for video summarization

Proceedings of the tenth ACM international conference on Multimedia
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Incremental learning of object detectors using a visual shape alphabet

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Keyframe Extraction Using Local Visual Semantics in the Form of a Region Thesaurus

SMAP '07 Proceedings of the Second International Workshop on Semantic Media Adaptation and Personalization
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
A Semantic Multimedia Analysis Approach Utilizing a Region Thesaurus and LSA

WIAMIS '08 Proceedings of the 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services
On the selection of MPEG-7 visual descriptors and their level of detail for nature disaster video sequences classification

SAMT'07 Proceedings of the semantic and digital media technologies 2nd international conference on Semantic Multimedia
A region thesaurus approach for high-level concept detection in the natural disaster domain

SAMT'07 Proceedings of the semantic and digital media technologies 2nd international conference on Semantic Multimedia
Fusing MPEG-7 visual descriptors for image classification

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Overview of the MPEG-7 standard

IEEE Transactions on Circuits and Systems for Video Technology
Color and texture descriptors

IEEE Transactions on Circuits and Systems for Video Technology
Support vector machines for histogram-based image classification

IEEE Transactions on Neural Networks
PicSOM-self-organizing image retrieval with MPEG-7 content descriptors

IEEE Transactions on Neural Networks

Wildlife video key-frame extraction based on novelty detection in semantic context

Multimedia Tools and Applications
A design-of-experiment based statistical technique for detection of key-frames

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a video analysis approach based on concept detection and keyframe extraction employing a visual thesaurus representation. Color and texture descriptors are extracted from coarse regions of each frame and a visual thesaurus is constructed after clustering regions. The clusters, called region types, are used as basis for representing local material information through the construction of a model vector for each frame, which reflects the composition of the image in terms of region types. Model vector representation is used for keyframe selection either in each video shot or across an entire sequence. The selection process ensures that all region types are represented. A number of high-level concept detectors is then trained using global annotation and Latent Semantic Analysis is applied. To enhance detection performance per shot, detection is employed on the selected keyframes of each shot, and a framework is proposed for working on very large data sets.