Unveiling locations in geo-spatial documents

Authors:
Gyan Ranjan;Juong-Sik Lee;Deepti Chafekar;Umesh Chandra
Affiliations:
University of Minnesota, Twin Cities, MN;Nokia Research Center, Palo Alto, CA;Nokia Research Center, Palo Alto, CA;Nokia Research Center, Palo Alto, CA
Venue:
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Year:
2011

Citing 5
Cited 0

A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Analysis of geographic queries in a search engine log

Proceedings of the first international workshop on Location and the web
A case study of using geographic cues to predict query news intent

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Resolving geo-identities of addresses in emerging economies where users rely primarily on short messaging as the means of querying, poses several daunting challenges: lack of proper addressing schemes, non-availability of cartographic information and non-standardized nomenclature of geo-spatial entities such as streets and avenues, to name a few. In this work, we propose a simple and elegant approach to solve this problem for emerging economies. By treating address texts as short documents and exploiting latent proximity information contained in them --- for example, landmark like references, similarity of address texts etc --- we transform the problem of resolving geo-identity to a search problem on short annotated geo-spatial documents, collected through extensive survey of six cities in India. Our solution spans all the phases of building a geo-identity resolution system, even though our emphasis is on the collection and organization of the corpus to facilitate a search engine backend for the task. Through experimentation based on a representative test set collected from the real world, we demonstrate how this approach achieves over 94% accuracy in resolution and an order of magnitude reduction in system state (memory) with nearly zero false-negatives - a significant improvement over the state of the art in emerging markets.