Location Discriminative Vocabulary Coding for Mobile Landmark Search

Authors:
Rongrong Ji;Ling-Yu Duan;Jie Chen;Hongxun Yao;Junsong Yuan;Yong Rui;Wen Gao
Affiliations:
Institute of Digital Media, Peking University, Beijing, China and Visual Intelligence Laboratory, Harbin Institute of Technology, Harbin, China;Institute of Digital Media, Peking University, Beijing, China;Institute of Digital Media, Peking University, Beijing, China;Visual Intelligence Laboratory, Harbin Institute of Technology, Harbin, China;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore;Microsoft China Research and Development Group, Beijing, China;Institute of Digital Media, Peking University, Beijing, China
Venue:
International Journal of Computer Vision
Year:
2012

Citing 24
Cited 16

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Comparison of Affine Region Detectors

International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Image Based Localization in Urban Environments

3DPVT '06 Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06)
How flickr helps us make sense of the world: context and content in community-contributed media collections

Proceedings of the 15th international conference on Multimedia
Structuring Visual Words in 3D for Arbitrary-View Object Localization

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Image-Based Information Guide on Mobile Devices

ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing, Part II
Mapping the world's photos

Proceedings of the 18th international conference on World wide web
Tree Histogram Coding for Mobile Image Matching

DCC '09 Proceedings of the 2009 Data Compression Conference
Compression of image patches for local feature extraction

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Mining city landmarks from blogs by graph modeling

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Location sensitive indexing for image-based advertising

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Location coding for mobile image retrieval

Proceedings of the 5th International ICST Mobile Multimedia Communications Conference
HPAT indexing for fast object/scene recognition based on local appearance

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Inverted Index Compression for Scalable Image Matching

DCC '10 Proceedings of the 2010 Data Compression Conference
PCA-SIFT: a more distinctive representation for local image descriptors

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Product Quantization for Nearest Neighbor Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I

k-Partite graph reinforcement and its application in multimedia information retrieval

Information Sciences: an International Journal
Hyperspectral image classification with hypergraph modelling

Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Mobile-based advertisement information retrieval from images and websites

Proceedings of the 20th ACM international conference on Multimedia
Geometric context-preserving progressive transmission in mobile visual search

Proceedings of the 20th ACM international conference on Multimedia
Location and route tracking in university from photos without GPS information

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Instance-Level landmark labeling via multi-layer superpixels

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Location based robust audio watermarking algorithm for social TV system

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
A Bayesian framework for dense depth estimation based on spatial-temporal correlation

Neurocomputing
Mining spatiotemporal video patterns towards robust action retrieval

Neurocomputing
Desynchronization attacks resilient image watermarking scheme based on global restoration and local embedding

Neurocomputing
Learning from mobile contexts to minimize the mobile location search latency

Image Communication
Weakly supervised codebook learning by iterative label propagation with graph quantization

Signal Processing
Residual enhanced visual vector as a compact signature for mobile visual search

Signal Processing
Robust and accurate mobile visual localization and its applications

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) - Special Sections on the 20th Anniversary of ACM International Conference on Multimedia, Best Papers of ACM Multimedia 2012
Efficient view based 3-D object retrieval using Hidden Markov Model

3D Research
Where should I stand? Learning based human position recommendation for mobile photographing

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the popularization of mobile devices, recent years have witnessed an emerging potential for mobile landmark search. In this scenario, the user experience heavily depends on the efficiency of query transmission over a wireless link. As sending a query photo is time consuming, recent works have proposed to extract compact visual descriptors directly on the mobile end towards low bit rate transmission. Typically, these descriptors are extracted based solely on the visual content of a query, and the location cues from the mobile end are rarely exploited. In this paper, we present a Location Discriminative Vocabulary Coding (LDVC) scheme, which achieves extremely low bit rate query transmission, discriminative landmark description, as well as scalable descriptor delivery in a unified framework. Our first contribution is a compact and location discriminative visual landmark descriptor, which is offline learnt in two-step: First, we adopt spectral clustering to segment a city map into distinct geographical regions, where both visual and geographical similarities are fused to optimize the partition of city-scale geo-tagged photos. Second, we propose to learn LDVC in each region with two schemes: (1) a Ranking Sensitive PCA and (2) a Ranking Sensitive Vocabulary Boosting. Both schemes embed location cues to learn a compact descriptor, which minimizes the retrieval ranking loss by replacing the original high-dimensional signatures. Our second contribution is a location aware online vocabulary adaption: We store a single vocabulary in the mobile end, which is efficiently adapted for a region specific LDVC coding once a mobile device enters a given region. The learnt LDVC landmark descriptor is extremely compact (typically 10---50 bits with arithmetical coding) and performs superior over state-of-the-art descriptors. We implemented the framework in a real-world mobile landmark search prototype, which is validated in a million-scale landmark database covering typical areas e.g. Beijing, New York City, Lhasa, Singapore, and Florence.