Conceptual structures and computational methods for indexing and organization of visual information

Authors:
Shih-Fu Chang;Alejandro Jaimes
Affiliations:
-;-
Venue:
Conceptual structures and computational methods for indexing and organization of visual information
Year:
2003

Citing 0
Cited 10

Memory cues for meeting video retrieval

Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
Detecting image near-duplicate by stochastic attributed relational graph matching with learning

Proceedings of the 12th annual ACM international conference on Multimedia
Sit straight (and tell me what I did today): a human posture alarm and activity summarization system

CARPE '05 Proceedings of the 2nd ACM workshop on Continuous archival and retrieval of personal experiences
A component-based multimedia a data model

Proceedings of the ACM workshop on Multimedia for human communication: from capture to convey
Practical elimination of near-duplicates from web video search

Proceedings of the 15th international conference on Multimedia
Real-time near-duplicate elimination for web video search with content and context

IEEE Transactions on Multimedia - Special issue on integration of context and content
Modal keywords, ontologies, and reasoning for video understanding

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
On the image content of a web segment: Chile as a case study

Journal of Web Engineering
Bayesian method for motion segmentation and tracking in compressed videos

PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition
Visual trigger templates for knowledge-based indexing

PCM'04 Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of automatic indexing and organization of visual information through user interaction at multiple levels. Our work focuses on the following three important areas: (1) understanding of visual content and the way users search and index it; (2) construction of flexible computational methods that learn how to automatically classify images and videos from user input at multiple levels; (3) integration of generic visual detectors in solving practical tasks in the specific domain of consumer photography. In particular, we present the following: (1) novel conceptual structures for classifying visual attributes (the Multi-Level Indexing Pyramid ); (2) a novel framework for learning structured visual detectors from user input (the Visual Apprentice); (3) a new study of human eye movements in observing images of different visual categories; (4) a new framework for the detection of non-identical duplicate consumer photographs in an interactive consumer image organization system; (5) detailed study of duplicate consumer photographs. In the Visual Apprentice (VA), first a user defines a model via a multiple-level definition hierarchy (a scene consists of objects, object-parts, etc.). Then, the user labels example images or videos based on the hierarchy (a handshake image contains two faces and a handshake) and visual features are extracted from each example. Finally, several machine learning algorithms are used to learn classifiers for different nodes of the hierarchy. The best classifiers and features are automatically selected to produce a Visual Detector (e.g., for a handshake), which is applied to new images or videos. In the human eye tracking experiments we examine variations in the way people look at images within and across different visual categories and explore ways of integrating eye tracking analysis with the VA framework. Finally, we present a novel framework for the detection of non-identical duplicate consumer images for systems that help users automatically organize their collections. Our approach is based on a multiple strategy that combines knowledge about the geometry of multiple views of the same scene, the extraction of low-level features, the detection of objects using the VA and domain knowledge.