Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

Authors:
James Philbin;Josef Sivic;Andrew Zisserman
Affiliations:
Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, UK;INRIA --- Willow Project, Laboratoire d'Informatique de l'Ecole Normale Supérieure, (CNRS/ENS/INRIA UMR 8548), Paris, France;Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, UK
Venue:
International Journal of Computer Vision
Year:
2011

Citing 25
Cited 6

Introduction to algorithms

Introduction to algorithms
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Communications of the ACM
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?"

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Image Parsing: Unifying Segmentation, Detection, and Recognition

International Journal of Computer Vision
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
LOCUS: Learning Object Classes with Unsupervised Segmentation

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Modeling Scenes with Local Descriptors and Latent Aspects

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Learning Object Categories from Google"s Image Search

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Photo tourism: exploring photo collections in 3D

ACM SIGGRAPH 2006 Papers
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Using Multiple Segmentations to Discover Objects and their Extent in Image Collections

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Scalable near identical image and shot detection

Proceedings of the 6th ACM international conference on Image and video retrieval
Scene Classification Using a Hybrid Generative/Discriminative Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Describing Visual Scenes Using Transformed Objects and Parts

International Journal of Computer Vision
World-scale mining of objects and events from community photo collections

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Scene Segmentation Using the Wisdom of Crowds

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Object Mining Using a Matching Graph on Very Large Image Collections

ICVGIP '08 Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing
Mapping the world's photos

Proceedings of the 18th international conference on World wide web
Video mining with frequent itemset configurations

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval

Videoscapes: exploring sparse, unstructured video collections

ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Allocating images and selecting image collections for distributed visual search

Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Match graph construction for large image databases

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Learning to match images in large-scale collections

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Spatially aware feature selection and weighting for object retrieval

Image and Vision Computing
A graph matching algorithm based on concavely regularized convex relaxation

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a large-scale collection of images our aim is to efficiently associate images which contain the same entity, for example a building or object, and to discover the significant entities. To achieve this, we introduce the Geometric Latent Dirichlet Allocation (gLDA) model for unsupervised discovery of particular objects in unordered image collections. This explicitly represents images as mixtures of particular objects or facades, and builds rich latent topic models which incorporate the identity and locations of visual words specific to the topic in a geometrically consistent way. Applying standard inference techniques to this model enables images likely to contain the same object to be probabilistically grouped and ranked.Additionally, to reduce the computational cost of applying the gLDA model to large datasets, we propose a scalable method that first computes a matching graph over all the images in a dataset. This matching graph connects images that contain the same object, and rough image groups can be mined from this graph using standard clustering techniques. The gLDA model can then be applied to generate a more nuanced representation of the data. We also discuss how "hub images" (images representative of an object or landmark) can easily be extracted from our matching graph representation.We evaluate our techniques on the publicly available Oxford buildings dataset (5K images) and show examples of automatically mined objects. The methods are evaluated quantitatively on this dataset using a ground truth labeling for a number of Oxford landmarks. To demonstrate the scalability of the matching graph method, we show qualitative results on two larger datasets of images taken of the Statue of Liberty (37K images) and Rome (1M+ images).