A non-parametric visual-sense model of images--extending the cluster hypothesis beyond text

Authors:
Kong-Wah Wan;Ah-Hwee Tan;Joo-Hwee Lim;Liang-Tien Chia
Affiliations:
Institute for Infocomm Research, Singapore, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore, Singapore;Institute for Infocomm Research, Singapore, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Venue:
Multimedia Tools and Applications
Year:
2012

Citing 29
Cited 0

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
TiVo: making show recommendations using a distributed collaborative filtering architecture

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical clustering of WWW image search results using visual, textual and link information

Proceedings of the 12th annual ACM international conference on Multimedia
Improving recommendation lists through topic diversification

WWW '05 Proceedings of the 14th international conference on World Wide Web
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Creating Efficient Codebooks for Visual Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Modeling Scenes with Local Descriptors and Latent Aspects

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Learning Object Categories from Google"s Image Search

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture

ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised Learning of Categories from Sets of Partially Matching Image Features

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Diversifying the image retrieval results

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
IGroup: presenting web image search results in semantic clusters

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Discriminating image senses by clustering with multimodal features

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Word2Image: towards visual interpreting of words

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Visual diversification of image search results

Proceedings of the 18th international conference on World wide web
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Journal of the ACM (JACM)
Overview of the ImageCLEFphoto 2008 photographic retrieval task

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main challenge of a search engine is to find information that are relevant and appropriate. However, this can become difficult when queries are issued using ambiguous words. Rijsbergen first hypothesized a clustering approach for web pages wherein closely associated pages are treated as a semantic group with the same relevance to the query (Rijsbergen 1979). In this paper, we extend Rijsbergen's cluster hypothesis to multimedia content such as images. Given a user query, the polysemy in the return image set is related to the many possible meanings of the query. We develop a method to cluster the polysemous images into their semantic categories. The resulting clusters can be seen as the visual senses of the query, which collectively embody the visual interpretations of the query. At the heart of our method is a non-parametric Bayesian approach that exploits the complementary text and visual information of images for semantic clustering. Latent structures of polysemous images are mined using the Hierarchical Dirichlet Process (HDP). HDP is a non-parametric Bayesian model that represents images using a mixture of components. The main advantage of our model is that the number of mixture components is not fixed a priori, but is determined during the posterior inference process. This allows our model to grow with the level of polysemy (and visual diversity) of images. The same set of components is used to model all images, with only the mixture weights varying amongst images. Evaluation results on a large collection of web images show the efficacy of our approach.