Modeling Semantic Aspects for Cross-Media Image Indexing

Authors:
Florent Monay;Daniel Gatica-Perez
Affiliations:
-;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2007

Citing 28
Cited 43

VisualSEEk: a fully automated content-based image query system

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Supporting similarity queries in MARS

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Variational Extensions to EM and Multinomial PCA

ECML '02 Proceedings of the 13th European Conference on Machine Learning
The Truth about Corel - Evaluation in Image Retrieval

CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Normalized Cuts and Image Segmentation

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Latent dirichlet allocation

The Journal of Machine Learning Research
Matching words and pictures

The Journal of Machine Learning Research
On image auto-annotation with latent space models

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Boosting Image Retrieval

International Journal of Computer Vision - Special Issue on Content-Based Image Retrieval
Robust Real-Time Face Detection

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Learning to Detect Objects in Images via a Sparse, Part-Based Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Leveraging face recognition technology to find and organize photos

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Test Data Likelihood for PLSA Models

Information Retrieval
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Modeling Scenes with Local Descriptors and Latent Aspects

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Multimedia semantic indexing using model vectors

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines

IEEE Transactions on Circuits and Systems for Video Technology

Automatic web image selection with a probabilistic latent topic model

Proceedings of the 17th international conference on World Wide Web
Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Face Image Annotation Based on Latent Semantic Space and Rules

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Exploring multimedia in a keyword space

MM '08 Proceedings of the 16th ACM international conference on Multimedia
What did you do today?: discovering daily routines from large-scale mobile data

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Can Geotags Help Image Recognition?

PSIVT '09 Proceedings of the 3rd Pacific Rim Symposium on Advances in Image and Video Technology
Histogram of oriented rectangles: A new pose descriptor for human action recognition

Image and Vision Computing
PLSI: The True Fisher Kernel and beyond

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Transductive Multi-Instance Multi-Label learning algorithm with application to automatic image annotation

Expert Systems with Applications: An International Journal
Scene classification using pLSA with visterm spatial location

IMCE '09 Proceedings of the 1st international workshop on Interactive multimedia for consumer electronics
Canonical contextual distance for large-scale image annotation and retrieval

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Tagging and retrieving images with co-occurrence models: from corel to flickr

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Music information retrieval using social tags and audio

IEEE Transactions on Multimedia - Special section on communities and media computing
Learning image semantics with latent aspect model

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Image annotation and retrieval based on efficient learning of contextual latent space

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Region-based automatic web image selection

Proceedings of the international conference on Multimedia information retrieval
Image retrieval based on multi-texton histogram

Pattern Recognition
A shared-subspace learning framework for multi-label classification

ACM Transactions on Knowledge Discovery from Data (TKDD)
Human action recognition using distribution of oriented rectangular patches

Proceedings of the 2nd conference on Human motion: understanding, modeling, capture and animation
A hybrid unsupervised image re-ranking approach with latent topic contents

Proceedings of the ACM International Conference on Image and Video Retrieval
NMF-based multimodal image indexing for querying by visual example

Proceedings of the ACM International Conference on Image and Video Retrieval
Modeling latent aspects for automatic image annotation

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Visual information in semantic representation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Topic models for image annotation and text illustration

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
How many words is a picture worth? Automatic caption generation for news images

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
IPSILON: incremental parsing for semantic indexing of latent concepts

IEEE Transactions on Image Processing
A feature-word-topic model for image annotation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Fusing semantic aspects for image annotation and retrieval

Journal of Visual Communication and Image Representation
A new approach to cross-modal multimedia retrieval

Proceedings of the international conference on Multimedia
Auto-tagging of images in non-english languages using tag language conversion

Proceedings of the international conference on Multimedia
Discovering multipart appearance models from captioned images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Discovering routines from large-scale human locations using probabilistic topic models

ACM Transactions on Intelligent Systems and Technology (TIST)
Modeling continuous visual features for semantic image annotation and retrieval

Pattern Recognition Letters
Two-probabilistic latent semantic model for image annotation and retrieval

ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume Part I
Face image annotation and retrieval in impressive words using minimum bounding rectangles of face parts

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part IV
Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization

Neurocomputing
Topic based query suggestions for video search

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
A novel multi-modal integration and propagation model for cross-media information retrieval

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Discovering hierarchical object models from captioned images

Computer Vision and Image Understanding
A feature-word-topic model for image annotation and retrieval

ACM Transactions on the Web (TWEB)
Learning semantic concepts from image database with hybrid generative/discriminative approach

Engineering Applications of Artificial Intelligence
Nonparametric bayesian upstream supervised multi-modal topic models

Proceedings of the 7th ACM international conference on Web search and data mining
Cross domain recommendation based on multi-type media fusion

Neurocomputing

Quantified Score

Hi-index	0.15

Visualization

Abstract

To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text captions, then allowing for the automatic creation of semantic indices for unannotated images. The task, however, remains unsolved. In this paper, we present three alternatives to learn a Probabilistic Latent Semantic Analysis model (PLSA) for annotated images, and evaluate their respective performance for automatic image indexing. Under the PLSA assumptions, an image is modeled as a mixture of latent aspects that generates both image features and text captions, and we investigate three ways to learn the mixture of aspects. We also propose a more discriminative image representation than the traditional Blob histogram, concatenating quantized local color information and quantized local texture descriptors. The first learning procedure of a PLSA model for annotated images is a standard EM algorithm, which implicitly assumes that the visual and the textual modalities can be treated equivalently. The other two models are based on an asymmetric PLSA learning, allowing to constrain the definition of the latent space on the visual or on the textual modality. We demonstrate that the textual modality is more appropriate to learn a semantically meaningful latent space, which translates into improved annotation performance. A comparison of our learning algorithms with respect to recent methods on a standard dataset is presented, and a detailed evaluation of the performance shows the validity of our framework.