Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
A probabilistic approach to spatiotemporal theme pattern mining on weblogs
Proceedings of the 15th international conference on World Wide Web
Mining geographic knowledge using location aware topic model
Proceedings of the 4th ACM workshop on Geographical information retrieval
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
Journal of the ACM (JACM)
GeoFolk: latent spatial semantics in web 2.0 social media
Proceedings of the third ACM international conference on Web search and data mining
Equip tourists with knowledge mined from travelogues
Proceedings of the 19th international conference on World wide web
A latent variable model for geographic lexical variation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Geographical topic discovery and comparison
Proceedings of the 20th international conference on World wide web
Unified analysis of streaming news
Proceedings of the 20th international conference on World wide web
Simple supervised document geolocation with geodesic grids
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Friendship and mobility: user movement in location-based social networks
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable inference in latent variable models
Proceedings of the fifth ACM international conference on Web search and data mining
Discovering geographical topics in the twitter stream
Proceedings of the 21st international conference on World Wide Web
Supervised text-based geolocation using language models on an adaptive grid
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Distributed large-scale natural graph factorization
Proceedings of the 22nd international conference on World Wide Web
How the live web feels about events
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Detecting non-gaussian geographical topics in tagged photo collections
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
With the availability of cheap location sensors, geotagging of messages in online social networks is proliferating. For instance, Twitter, Facebook, Foursquare, and Google+ provide these services both explicitly by letting users choose their location or implicitly via a sensor. This paper presents an integrated generative model of location and message content. That is, we provide a model for combining distributions over locations, topics, and over user characteristics, both in terms of location and in terms of their content preferences. Unlike previous work which modeled data in a flat pre-defined representation, our model automatically infers both the hierarchical structure over content and over the size and position of geographical locations. This affords significantly higher accuracy --- location uncertainty is reduced by 40% relative to the best previous results [21] achieved on location estimation from Tweets. We achieve this goal by proposing a new statistical model, the nested Chinese Restaurant Franchise (nCRF), a hierarchical model of tree distributions. Much statistical structure is shared between users. That said, each user has his own distribution over interests and places. The use of the nCRF allows us to capture the following effects: (1) We provide a topic model for Tweets; (2) We obtain location specific topics; (3) We infer a latent distribution of locations; (4) We provide a joint hierarchical model of topics and locations; (5) We infer personalized preferences over topics and locations within the above model. In doing so, we are both able to obtain accurate estimates of the location of a user based on his tweets and to obtain a detailed estimate of a geographical language model.