Layered hypernetwork models for cross-modal associative text and image keyword generation in multimodal information retrieval

Authors:
Jung-Woo Ha;Byoung-Hee Kim;Bado Lee;Byoung-Tak Zhang
Affiliations:
Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea;Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea;Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea;Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea
Venue:
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Year:
2010

Citing 10
Cited 0

Multimodal human discourse: gesture and speech

ACM Transactions on Computer-Human Interaction (TOCHI)
Multimedia content processing through cross-modal association

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Multimodal concept-dependent active learning for image retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
Co-Adaptation of audio-visual speech and gesture classifiers

Proceedings of the 8th international conference on Multimodal interfaces
A review of text and image retrieval approaches for broadcast news video

Information Retrieval
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Annotating images and image objects using a hierarchical dirichlet process model

Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Hypernetworks: A Molecular Evolutionary Architecture for Cognitive Learning and Memory

IEEE Computational Intelligence Magazine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional methods for multimodal data retrieval use text-tag based or cross-modal approaches such as tag-image co-occurrence and canonical correlation analysis. Since there are differences of granularity in text and image features, however, approaches based on lower-order relationship between modalities may have limitations. Here, we propose a novel text and image keyword generation method by cross-modal associative learning and inference with multimodal queries. We use a modified hypernetwork model, i.e. layered hypernetworks (LHNs) which consists of the first (lower) layer and the second (upper) layer which has more than two modality-dependent hypernetworks and one modality-integrating hypernetwork, respectively. LHNs learn higher-order associative relationships between text and image modalities by training on an example set. After training, LHNs are used to extend multimodal queries by generating text and image keywords via cross-modal inference, i.e. text-to-image and image-to-text. The LHNs are evaluated on Korean magazine articles with images on women fashions and life-style. Experimental results show that the proposed method generates vision-language cross-modal keywords with high accuracy. The results also show that multimodal queries improve the accuracy of keyword generation compared with uni-modal ones.