The Journal of Machine Learning Research
Mining geographic knowledge using location aware topic model
Proceedings of the 4th ACM workshop on Geographical information retrieval
A latent variable model for geographic lexical variation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
You are where you tweet: a content-based approach to geo-locating twitter users
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Communications of the ACM
Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Simple supervised document geolocation with geodesic grids
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Friendship and mobility: user movement in location-based social networks
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
"I'm eating a sandwich in Glasgow": modeling locations with tweets
Proceedings of the 3rd international workshop on Search and mining user-generated contents
Scikit-learn: Machine Learning in Python
The Journal of Machine Learning Research
Discovering geographical topics in the twitter stream
Proceedings of the 21st international conference on World Wide Web
How Social Media Will Change Public Health
IEEE Intelligent Systems
Supervised text-based geolocation using language models on an adaptive grid
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Modeling locations with social media
Information Retrieval
@Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Location Extraction from Social Networks with Commodity Software and Online Data
ICDMW '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops
Hi-index | 0.00 |
Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gaussian mixture models. Further, because real-world applications depend on quantified uncertainty for such estimates, we propose novel metrics of accuracy, precision, and calibration, and we evaluate our approach accordingly. Experiments on 13 million global, comprehensively multi-lingual tweets show that our approach yields reliable, well-calibrated results competitive with previous computationally intensive methods. We also show that a relatively small number of training data are required for good estimates (roughly 30,000 tweets) and models are quite time-invariant (effective on tweets many weeks newer than the training set). Finally, we show that toponyms and languages with small geographic footprint provide the most useful location signals.