Multilayer pLSA for multimodal image retrieval

Authors:
Rainer Lienhart;Stefan Romberg;Eva Hörster
Affiliations:
University of Augsburg, Augsburg, Germany;University of Augsburg, Augsburg, Germany;University of Augsburg, Augsburg, Germany
Venue:
Proceedings of the ACM International Conference on Image and Video Retrieval
Year:
2009

Citing 15
Cited 15

Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Matching words and pictures

The Journal of Machine Learning Research
On Intelligence

On Intelligence
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
PLSA-based image auto-annotation: constraining the latent space

Proceedings of the 12th annual ACM international conference on Multimedia
Shape Matching and Object Recognition Using Low Distortion Correspondences

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Image retrieval on large-scale image databases

Proceedings of the 6th ACM international conference on Image and video retrieval
How flickr helps us make sense of the world: context and content in community-contributed media collections

Proceedings of the 15th international conference on Multimedia
Continuous visual vocabulary modelsfor pLSA-based scene recognition

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Deep networks for image retrieval on large-scale databases

MM '08 Proceedings of the 16th ACM international conference on Multimedia
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Scene classification via pLSA

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV

Multimodal ranking for image search on community databases

Proceedings of the international conference on Multimedia information retrieval
Multi modal semantic indexing for image retrieval

Proceedings of the ACM International Conference on Image and Video Retrieval
Weakly-paired maximum covariance analysis for multimodal dimensionality reduction and transfer learning

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Correlated PLSA for image clustering

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part I
Toward a higher-level visual representation for content-based image retrieval

Proceedings of the 8th International Conference on Advances in Mobile Computing and Multimedia
Multi-feature pLSA for combining visual features in image annotation

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Leveraging community metadata for multimodal image ranking

Multimedia Tools and Applications
Topic based query suggestions for video search

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Toward a higher-level visual representation for content-based image retrieval

Multimedia Tools and Applications
An automated vision based on-line novel percept detection method for a mobile robot

Robotics and Autonomous Systems
i-TagRanker: an efficient tag ranking system for image sharing and retrieval using the semantic relationships between tags

Multimedia Tools and Applications
A semantic model for cross-modal and multi-modal retrieval

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
High order pLSA for indexing tagged images

Signal Processing
Web media semantic concept retrieval via tag removal and model fusion

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
A feature-word-topic model for image annotation and retrieval

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is current state of knowledge that our neocortex consists of six layers [10]. We take this knowledge from neuroscience as an inspiration to extend the standard single-layer probabilistic Latent Semantic Analysis (pLSA) [13] to multiple layers. As multiple layers should naturally handle multiple modalities and a hierarchy of abstractions, we denote this new approach multilayer multimodal probabilistic Latent Semantic Analysis (mm-pLSA). We derive the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs (here from two different data modalities: image tags and visual image features) and a single top-level pLSA node merging the two leaf-pLSAs. From this derivation it is obvious how to extend the learning and inference rules to more modalities and more layers. We also propose a fast and strictly stepwise forward procedure to initialize bottom-up the mm-pLSA model, which in turn can then be post-optimized by the general mm-pLSA learning algorithm. We evaluate the proposed approach experimentally in a query-by-example retrieval task using 50-dimensional topic vectors as image models. We compare various variants of our mm-pLSA system to systems relying solely on visual features or tag features and analyze possible pitfalls of the mm-pLSA training. It is shown that the best variant of the the proposed mm-pLSA system outperforms the unimodal systems by approximately 19% in our query-by-example task.