Layered hypernetwork models for cross-modal associative text and image keyword generation in multimodal information retrieval

  • Authors:
  • Jung-Woo Ha;Byoung-Hee Kim;Bado Lee;Byoung-Tak Zhang

  • Affiliations:
  • Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea;Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea;Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea;Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea

  • Venue:
  • PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Conventional methods for multimodal data retrieval use text-tag based or cross-modal approaches such as tag-image co-occurrence and canonical correlation analysis. Since there are differences of granularity in text and image features, however, approaches based on lower-order relationship between modalities may have limitations. Here, we propose a novel text and image keyword generation method by cross-modal associative learning and inference with multimodal queries. We use a modified hypernetwork model, i.e. layered hypernetworks (LHNs) which consists of the first (lower) layer and the second (upper) layer which has more than two modality-dependent hypernetworks and one modality-integrating hypernetwork, respectively. LHNs learn higher-order associative relationships between text and image modalities by training on an example set. After training, LHNs are used to extend multimodal queries by generating text and image keywords via cross-modal inference, i.e. text-to-image and image-to-text. The LHNs are evaluated on Korean magazine articles with images on women fashions and life-style. Experimental results show that the proposed method generates vision-language cross-modal keywords with high accuracy. The results also show that multimodal queries improve the accuracy of keyword generation compared with uni-modal ones.