Image and video indexing using networks of operators

Authors:
Stéphane Ayache;Georges Quénot;Jérôome Gensel
Affiliations:
Multimedia Information Retrieval (MRIM) Group of LIG, Laboratoire d'Informatique de Grenoble, Grenoble, France;Multimedia Information Retrieval (MRIM) Group of LIG, Laboratoire d'Informatique de Grenoble, Grenoble, France;Spatio-Temporal Information, Adaptability, Multimédia and Knowledge Représentation (STEAMER) Group of LIG, Laboratoire d'Informatique de Grenoble, Grenoble, France
Venue:
Journal on Image and Video Processing
Year:
2007

Citing 18
Cited 4

Original Contribution: Stacked generalization

Neural Networks
Support-Vector Networks

Machine Learning
Massively parallel data flow computer dedicated to real-time image processing

Integrated Computer-Aided Engineering
Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs

Communications of the ACM
Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Discriminative model fusion for semantic concept detection and annotation in video

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
On the detection of semantic concepts at TRECVID

Proceedings of the 12th annual ACM international conference on Multimedia
A comparison of active classification methods for content-based image retrieval

Proceedings of the 1st international workshop on Computer vision meets databases
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Learning rich semantics from news video archives by style analysis

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Large-Scale Concept Ontology for Multimedia

IEEE MultiMedia
The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing

IEEE Transactions on Pattern Analysis and Machine Intelligence
On supervision and statistical learning for semantic multimedia analysis

Journal of Visual Communication and Image Representation
Classifier fusion for SVM-based multimedia semantic indexing

ECIR'07 Proceedings of the 29th European conference on IR research
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Using topic concepts for semantic video shots classification

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval

The lIGVID system for video retrieval and concept annotation

Proceedings of the international conference on Multimedia information retrieval
Incremental multi-classifier learning algorithm on grid'5000 for large scale image annotation

Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
On the usefulness of similarity based projection spaces for transfer learning

SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Active learning with multiple classifiers for multimedia indexing

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents a framework for the design of concept detection systems for image and video indexing. This framework integrates in a homogeneous way all the data and processing types. The semantic gap is crossed in a number of steps, each producing a small increase in the abstraction level of the handled data. All the data inside the semantic gap and on both sides included are seen as a homogeneous type called numcept and all the processing modules between the various numcepts are seen as a homogeneous type called operator. Concepts are extracted from the raw signal using networks of operators operating on numcepts. These networks can be represented as data-flow graphs and the introduced homogenizations allow fusing elements regardless of their nature. Low-level descriptors can be fused with intermediate of final concepts. This framework has been used to build a variety of indexing networks for images and videos and to evaluate many aspects of them. Using annotated corpora and protocols of the 2003 to 2006 TRECVID evaluation campaigns, the benefit brought by the use of individual features, the use of several modalities, the use of various fusion strategies, and the use of topologic and conceptual contexts was measured. The framework proved its efficiency for the design and evaluation of a series of network architectures while factorizing the training effort for common subnetworks.