Automatic gazetteer enrichment with user-geocoded data

  • Authors:
  • Judith Gelernter;Gautam Ganesh;Hamsini Krishnakumar;Wei Zhang

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;University of Texas at Dallas, Richardson, TX;Guindy Anna University, Chennai, India;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Geographical knowledge resources or gazetteers that are enriched with local information have the potential to add geographic precision to information retrieval. We have identified sources of novel local gazetteer entries in crowd-sourced OpenStreetMap and Wikimapia geotags that include geo-coordinates. We created a fuzzy match algorithm using machine learning (SVM) that checks both for approximate spelling and approximate geocoding in order to find duplicates between the crowd-sourced tags and the gazetteer in effort to absorb those tags that are novel. For each crowd-sourced tag, our algorithm generates candidate matches from the gazetteer and then ranks those candidates based on word form or geographical relations between each tag and gazetteer candidate. We compared a baseline of edit distance for candidate ranking to an SVM-trained candidate ranking model on a city level location tag match task. Experiment results show that the SVM greatly outperforms the baseline.