A Thousand Words in a Scene

Authors:
P. Quelhas;F. Monay;J. -M. Odobez;D. Gatica-Perez;T. Tuytelaars
Affiliations:
IDIAP Res. Inst., Martigny;-;-;-;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2007

Citing 0
Cited 37

TV ad video categorization with probabilistic latent concept learning

Proceedings of the international workshop on Workshop on multimedia information retrieval
Scene modeling in global-local view for scene classification

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Media objects for user-centered similarity matching

Multimedia Tools and Applications
Hierarchical kernel stick-breaking process for multi-task image analysis

Proceedings of the 25th international conference on Machine learning
SIFT-Bag kernel for video event analysis

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Video object matching across multiple independent views using local descriptors and adaptive learning

Pattern Recognition Letters
Semantic Scene Classification for Image Annotation and Retrieval

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Contextual classification of image patches with latent aspect models

Journal on Image and Video Processing - Special issue on patches in vision
Learned local Gabor patterns for face representation and recognition

Signal Processing
Style modeling for tagging personal photo collections

Proceedings of the ACM International Conference on Image and Video Retrieval
Music information retrieval using social tags and audio

IEEE Transactions on Multimedia - Special section on communities and media computing
Image classification based on pyramid histogram of topics

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Learning social tag relevance by neighbor voting

IEEE Transactions on Multimedia
Relevance filtering meets active learning: improving web-based concept detectors

Proceedings of the international conference on Multimedia information retrieval
Topic models for semantics-preserving video compression

Proceedings of the international conference on Multimedia information retrieval
Learning automatic concept detectors from online video

Computer Vision and Image Understanding
Comparing compact codebooks for visual categorization

Computer Vision and Image Understanding
Learning natural scene categories by selective multi-scale feature extraction

Image and Vision Computing
A hybrid unsupervised image re-ranking approach with latent topic contents

Proceedings of the ACM International Conference on Image and Video Retrieval
Discrete visual features modeling via leave-one-out likelihood estimation and applications

Journal of Visual Communication and Image Representation
Semantic modeling of natural scenes based on contextual Bayesian networks

Pattern Recognition
Discovering routines from large-scale human locations using probabilistic topic models

ACM Transactions on Intelligent Systems and Technology (TIST)
Scale invariant gabor descriptor-based noncooperative iris recognition

EURASIP Journal on Advances in Signal Processing - Special issue on advanced image processing for defense and security applications
Boosting histograms of descriptor distances for scalable multiclass specific scene recognition

Image and Vision Computing
Towards a more discriminative and semantic visual vocabulary

Computer Vision and Image Understanding
A relational vector space model using an advanced weighting scheme for image retrieval

Information Processing and Management: an International Journal
Mining Layered Grammar Rules for Action Recognition

International Journal of Computer Vision
Building global image features for scene recognition

Pattern Recognition
Two-probabilistic latent semantic model for image annotation and retrieval

ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume Part I
Nearest-Neighbor based Metric Functions for indoor scene recognition

Computer Vision and Image Understanding
A Review of Codebook Models in Patch-Based Visual Object Recognition

Journal of Signal Processing Systems
Probabilistic semantic component descriptor

Multimedia Tools and Applications
Toward a higher-level visual representation for content-based image retrieval

Multimedia Tools and Applications
Picture tags and world knowledge: learning tag relations from visual semantic sources

Proceedings of the 21st ACM international conference on Multimedia
Weighted visual vocabulary to balance the descriptive ability on general dataset

Neurocomputing
Continuous human action recognition in real time

Multimedia Tools and Applications
ReigSAC: fast discrimination of spurious keypoint correspondences on planar surfaces

Machine Vision and Applications

Quantified Score

Hi-index	0.14

Visualization

Abstract

This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a textlike bag-of-visterms (BOV) representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, (2) whether some analogies between discrete scene representations and text documents exist, and 3) whether unsupervised, latent space models can be used both as feature extractors for the classification task and to discover patterns of visual co-occurrence. Using several data sets, we validate our approach, presenting and discussing experiments on each of these issues. We first show, with extensive experiments on binary and multiclass scene classification tasks using a 9,500-image data set, that the BOV representation consistently outperforms classical scene classification approaches. In other data sets, we show that our approach competes with or outperforms other recent more complex methods. We also show that probabilistic latent semantic analysis (PLSA) generates a compact scene representation, is discriminative for accurate classification, and is more robust than the BOV representation when less labeled training data is available. Finally, through aspect-based image ranking experiments, we show the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections.