Modeling locations with social media

Authors:
Neil O'Hare;Vanessa Murdock
Affiliations:
Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain
Venue:
Information Retrieval
Year:
2013

Citing 0
Cited 6

Mining the web for points of interest

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Gender-based models of location from flickr

Proceedings of the ACM multimedia 2012 workshop on Geotagging and its applications in multimedia
Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Reliable spatio-temporal signal extraction and exploration from human activity records

SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases
Inferring the origin locations of tweets with quantitative confidence

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Automatic gazetteer enrichment with user-geocoded data

Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we focus on the locations explicit and implicit in users descriptions of their surroundings. We propose a statistical language modeling approach to identifying locations in arbitrary text, and investigate several ways to estimate the models, based on the term frequency and the user frequency. The geotagged public photos in Flickr serve as a convenient ground truth. Our results show that we can predict location within a one聽kilometer by one聽kilometer cell with 17聽% accuracy, and within a three聽kilometer radius around such a one聽kilometer cell with 40聽% accuracy, using only a photo's tags. This is significantly better than the state of the art. Further we examine several estimation strategies that leverage the physical proximity of places, and show that for sparsely represented locations, smoothing from the immediate neighborhood improves results. We also show that estimation strategies based on user frequency are much more reliable than approaches based on the raw term frequency.