A visual approach for video geocoding using bag-of-scenes

Authors:
Otávio A. B. Penatti;Lin Tzy Li;Jurandy Almeida;Ricardo da S. Torres
Affiliations:
University of Campinas, Campinas, SP, Brazil;University of Campinas, Campinas, SP, Brazil and Telecommunications Res. & Dev. Center, Campinas, SP, Brazil;University of Campinas, Campinas, SP, Brazil;University of Campinas, Campinas, SP, Brazil
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 28
Cited 6

Temporal Color Correlograms for Video Retrieval

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 1 - Volume 1
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Space-Time Interest Points

International Journal of Computer Vision
A Comparison of Affine Region Detectors

International Journal of Computer Vision
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Near-duplicate keyframe retrieval with visual keywords and semantic context

Proceedings of the 6th ACM international conference on Image and video retrieval
Local invariant feature detectors: a survey

Foundations and Trends® in Computer Graphics and Vision
Experiments on Selection of Codebooks for Local Image Feature Histograms

VISUAL '08 Proceedings of the 10th international conference on Visual Information Systems: Web-Based Visual Information Search and Management
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Computer Vision and Image Understanding
Placing flickr photos on a map

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Geographic information retrieval and digital libraries

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rapid Video Summarization on Compressed Video

ISM '10 Proceedings of the 2010 IEEE International Symposium on Multimedia
Geotagging in multimedia and computer vision--a survey

Multimedia Tools and Applications
VIRaL: Visual Image Retrieval and Localization

Multimedia Tools and Applications
Finding locations of flickr resources using language models and similarity search

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Automatic tagging and geotagging in video collections and communities

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Multi-modal, multi-resource methods for placing Flickr videos on the map

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
VISON: VIdeo Summarization for ONline applications

Pattern Recognition Letters
Encoding spatial arrangement of visual words

CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Color and texture descriptors

IEEE Transactions on Circuits and Systems for Video Technology

Multimedia multimodal geocoding

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Multimodal geo-tagging in social media websites using hierarchical spatial segmentation

Proceedings of the 5th ACM SIGSPATIAL International Workshop on Location-Based Social Networks
Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Geo-visual ranking for location prediction of social images

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Domain-specific image geocoding: a case study on Virginia tech building photos

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Georeferencing Flickr resources based on textual meta-data

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel approach for video representation, called bag-of-scenes. The proposed method is based on dictionaries of scenes, which provide a high-level representation for videos. Scenes are elements with much more semantic information than local features, specially for geotagging videos using visual content. Thus, each component of the representation model has self-contained semantics and, hence, it can be directly related to a specific place of interest. Experiments were conducted in the context of the MediaEval 2011 Placing Task. The reported results show our strategy compared to those from other participants that used only visual content to accomplish this task. Despite our very simple way to generate the visual dictionary, which has taken photos at random, the results show that our approach presents high accuracy relative to the state-of-the art solutions.