Mining the web for points of interest

  • Authors:
  • Adam Rae;Vanessa Murdock;Adrian Popescu;Hugues Bouchard

  • Affiliations:
  • Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;CEA, LIST, Gif-sur-Yvette, France, France;Yahoo! Research, Barcelona, Spain

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A point of interest (POI) is a focused geographic entity such as a landmark, a school, an historical building, or a business. Points of interest are the basis for most of the data supporting location-based applications. In this paper we propose to curate POIs from online sources by bootstrapping training data from Web snippets, seeded by POIs gathered from social media. This large corpus is used to train a sequential tagger to recognize mentions of POIs in text. Using Wikipedia data as the training data, we can identify POIs in free text with an accuracy that is 116% better than the state of the art POI identifier in terms of precision, and 50% better in terms of recall. We show that using Foursquare and Gowalla checkins as seeds to bootstrap training data from Web snippets, we can improve precision between 16% and 52%, and recall between 48% and 187% over the state-of-the-art. The name of a POI is not sufficient, as the POI must also be associated with a set of geographic coordinates. Our method increases the number of POIs that can be localized nearly three-fold, from 134 to 395 in a sample of 400, with a median localization accuracy of less than one kilometer.