Visual query expansion via incremental hypernetwork models of image and text

Authors:
Min-Oh Heo;Myunggu Kang;Byoung-Tak Zhang
Affiliations:
Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea;Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea;Biointelligence Lab, School of Computer Science and Engineering, Seoul National University, Seoul, Korea
Venue:
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Year:
2010

Citing 12
Cited 0

Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Multimedia content processing through cross-modal association

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Automatic multimedia cross-modal correlation discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A review of text and image retrieval approaches for broadcast news video

Information Retrieval
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Bag-of-visual-words expansion using visual relatedness for video indexing

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Logo retrieval with a contrario visual query expansion

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Hypernetworks: A Molecular Evolutionary Architecture for Cognitive Learning and Memory

IEEE Computational Intelligence Magazine
Mental imagery for a conversational robot

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Humans can associate vision and language modalities and thus generate mental imagery, i.e. visual images, from linguistic input in an environment of unlimited inflowing information. Inspired by human memory, we separate a text-to-image retrieval task into two steps: 1) text-to-image conversion (generating visual queries for the 2 step) and 2) image-to-image retrieval task. This separation is advantageous for inner representation visualization, learning incremental dataset, using the results of content-based image retrieval. Here, we propose a visual query expansion method that simulates the capability of human associative memory. We use a hyperenetwork model (HN) that combines visual words and linguistic words. HNs learn the higher-order cross-modal associative relationships incrementally on a set of image-text pairs in sequence. An incremental HN generates images by assembling visual words based on linguistic cues. And we retrieve similar images with the generated visual query. The method is evaluated on 26 video clips of 'Thomas and Friends'. Experiments show the performance of successive image retrieval rate up to 98.1% with a single text cue. It shows the additional potential to generate the visual query with several text cues simultaneously.