A vector space model for automatic indexing
Communications of the ACM
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
You are where you tweet: a content-based approach to geo-locating twitter users
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Do all birds tweet the same?: characterizing twitter around the world
Proceedings of the 20th ACM international conference on Information and knowledge management
"I'm eating a sandwich in Glasgow": modeling locations with tweets
Proceedings of the 3rd international workshop on Search and mining user-generated contents
Hi-index | 0.00 |
Online social networks are known to be demographically biased. Currently there are questions about what degree of representativity of the physical population they have, and how population biases impact user-generated content. In this paper we focus on centralism, a problem affecting Chile. Assuming that local differences exist in a country, in terms of vocabulary, we built a methodology based on the vector space model to find distinctive content from different locations, and used it to create classifiers to predict whether the content of a micro-post is related to a particular location, having in mind a geographically diverse selection of micro-posts. We evaluate them in a case study where we analyze the virtual population of Chile that participated in the Twitter social network during an event of national relevance: the municipal (local governments) elections held in 2012. We observe that the participating virtual population is spatially representative of the physical population, implying that there is centralism in Twitter. Our classifiers out-perform a non geographically-diverse baseline at the regional level, and have the same accuracy at a provincial level. However, our approach makes assumptions that need to be tested in multi-thematic and more general datasets. We leave this for future work.