A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Tree Data Structures for N-Body Simulation
SIAM Journal on Computing
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
Multidimensional binary search trees used for associative searching
Communications of the ACM
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Mean Shift: A Robust Approach Toward Feature Space Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Computing Geographical Scopes of Web Resources
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Placing flickr photos on a map
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Find me if you can: improving geographical prediction with social and spatial proximity
Proceedings of the 19th international conference on World wide web
A latent variable model for geographic lexical variation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
You are where you tweet: a content-based approach to geo-locating twitter users
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Simple supervised document geolocation with geodesic grids
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
"I'm eating a sandwich in Glasgow": modeling locations with tweets
Proceedings of the 3rd international workshop on Search and mining user-generated contents
Finding your friends and following them to where you are
Proceedings of the fifth ACM international conference on Web search and data mining
Finding wormholes with flickr geotags
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Clues from the beaten path: Location estimation with bursty sequences of tourist photos
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Hierarchical geographical modeling of user locations from social media posts
Proceedings of the 22nd international conference on World Wide Web
Inferring the origin locations of tweets with quantitative confidence
Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Tracing the German centennial flood in the stream of tweets: first lessons learned
Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information
Hi-index | 0.00 |
The geographical properties of words have recently begun to be exploited for geolocating documents based solely on their text, often in the context of social media and online content. One common approach for geolocating texts is rooted in information retrieval. Given training documents labeled with latitude/longitude coordinates, a grid is overlaid on the Earth and pseudo-documents constructed by concatenating the documents within a given grid cell; then a location for a test document is chosen based on the most similar pseudo-document. Uniform grids are normally used, but they are sensitive to the dispersion of documents over the earth. We define an alternative grid construction using k-d trees that more robustly adapts to data, especially with larger training sets. We also provide a better way of choosing the locations for pseudo-documents. We evaluate these strategies on existing Wikipedia and Twitter corpora, as well as a new, larger Twitter corpus. The adaptive grid achieves competitive results with a uniform grid on small training sets and outperforms it on the large Twitter corpus. The two grid constructions can also be combined to produce consistently strong results across all training sets.