Using visual-textual mutual information and entropy for inter-modal document indexing

Authors:
Jean Martinet;Shin'ichi Satoh
Affiliations:
National Institute of Informatics, Multimedia Information Research Division, Tokyo, Japan;National Institute of Informatics, Multimedia Information Research Division, Tokyo, Japan
Venue:
ECIR'07 Proceedings of the 29th European conference on IR research
Year:
2007

Citing 6
Cited 2

A design space for multimodal systems: concurrent processing and data fusion

CHI '93 Proceedings of the INTERACT '93 and CHI '93 Conference on Human Factors in Computing Systems
An evaluation of term dependence models in information retrieval

SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Multimodal Video Indexing: A Review of the State-of-the-art

Multimedia Tools and Applications
Recognizing objects and scenes in news videos

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval

Media objects for user-centered similarity matching

Multimedia Tools and Applications
A relational vector space model using an advanced weighting scheme for image retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a contribution in the domain of automatic visual document indexing based on inter-modal analysis, in the form of a statistical indexing model. The approach is based on intermodal document analysis, which consists in modeling and learning some relationships between several modalities from a data set of annotated documents in order to extract semantics. When one of the modalities is textual, the learned associations can be used to predict a textual index for visual data from a new document (image or video). More specifically, the presented approach relies on a learning process in which associations between visual and textual information are characterized by the mutual information of the modalities. Besides, the model uses the information entropy of the distribution of the visual modality against the textual modality as a second source to select relevant indexing terms. We have implemented the proposed information theoretic model, and the results of experiments assessing its performance on two collections (image and video) show that information theory is an interesting framework to automatically annotate documents.