Supervised text-based geolocation using language models on an adaptive grid

Authors:
Stephen Roller;Michael Speriosu;Sarat Rallapalli;Benjamin Wing;Jason Baldridge
Affiliations:
University of Texas at Austin;University of Texas at Austin;University of Texas at Austin;University of Texas at Austin;University of Texas at Austin
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 17
Cited 3

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Tree Data Structures for N-Body Simulation

SIAM Journal on Computing
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Multidimensional binary search trees used for associative searching

Communications of the ACM
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Computing Geographical Scopes of Web Resources

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Placing flickr photos on a map

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
TwitterStand: news in tweets

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Find me if you can: improving geographical prediction with social and spatial proximity

Proceedings of the 19th international conference on World wide web
A latent variable model for geographic lexical variation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
You are where you tweet: a content-based approach to geo-locating twitter users

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Simple supervised document geolocation with geodesic grids

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
"I'm eating a sandwich in Glasgow": modeling locations with tweets

Proceedings of the 3rd international workshop on Search and mining user-generated contents
Finding your friends and following them to where you are

Proceedings of the fifth ACM international conference on Web search and data mining
Finding wormholes with flickr geotags

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Clues from the beaten path: Location estimation with bursty sequences of tourist photos

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Hierarchical geographical modeling of user locations from social media posts

Proceedings of the 22nd international conference on World Wide Web
Inferring the origin locations of tweets with quantitative confidence

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Tracing the German centennial flood in the stream of tweets: first lessons learned

Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information

Quantified Score

Hi-index	0.00

Visualization

Abstract

The geographical properties of words have recently begun to be exploited for geolocating documents based solely on their text, often in the context of social media and online content. One common approach for geolocating texts is rooted in information retrieval. Given training documents labeled with latitude/longitude coordinates, a grid is overlaid on the Earth and pseudo-documents constructed by concatenating the documents within a given grid cell; then a location for a test document is chosen based on the most similar pseudo-document. Uniform grids are normally used, but they are sensitive to the dispersion of documents over the earth. We define an alternative grid construction using k-d trees that more robustly adapts to data, especially with larger training sets. We also provide a better way of choosing the locations for pseudo-documents. We evaluate these strategies on existing Wikipedia and Twitter corpora, as well as a new, larger Twitter corpus. The adaptive grid achieves competitive results with a uniform grid on small training sets and outperforms it on the large Twitter corpus. The two grid constructions can also be combined to produce consistently strong results across all training sets.