Mining the web for points of interest

Authors:
Adam Rae;Vanessa Murdock;Adrian Popescu;Hugues Bouchard
Affiliations:
Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;CEA, LIST, Gif-sur-Yvette, France, France;Yahoo! Research, Barcelona, Spain
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 19
Cited 2

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Computing Geographical Scopes of Web Resources

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Web-a-where: geotagging web content

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
On assigning place names to geography related web pages

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Proceedings of the 15th international conference on World Wide Web
World explorer: visualizing aggregate data from unstructured text in geo-referenced collections

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
How flickr helps us make sense of the world: context and content in community-contributed media collections

Proceedings of the 15th international conference on Multimedia
Mining geographic knowledge using location aware topic model

Proceedings of the 4th ACM workshop on Geographical information retrieval
Spirittagger: a geo-aware tag suggestion tool mined from flickr

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Discovering users' specific geo intention in web search

Proceedings of the 18th international conference on World wide web
Mapping the world's photos

Proceedings of the 18th international conference on World wide web
Placing flickr photos on a map

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Collaborative location and activity recommendations with GPS history data

Proceedings of the 19th international conference on World wide web
Geographical topic discovery and comparison

Proceedings of the 20th international conference on World wide web
Modeling locations with social media

Information Retrieval

On the enrichment of a RDF repository of city points of interest based on social data

Proceedings of the 2nd International Workshop on Open Data
Towards precise POI localization with social media

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

A point of interest (POI) is a focused geographic entity such as a landmark, a school, an historical building, or a business. Points of interest are the basis for most of the data supporting location-based applications. In this paper we propose to curate POIs from online sources by bootstrapping training data from Web snippets, seeded by POIs gathered from social media. This large corpus is used to train a sequential tagger to recognize mentions of POIs in text. Using Wikipedia data as the training data, we can identify POIs in free text with an accuracy that is 116% better than the state of the art POI identifier in terms of precision, and 50% better in terms of recall. We show that using Foursquare and Gowalla checkins as seeds to bootstrap training data from Web snippets, we can improve precision between 16% and 52%, and recall between 48% and 187% over the state-of-the-art. The name of a POI is not sufficient, as the POI must also be associated with a set of geographic coordinates. Our method increases the number of POIs that can be localized nearly three-fold, from 134 to 395 in a sample of 400, with a median localization accuracy of less than one kilometer.