Usage patterns of collaborative tagging systems
Journal of Information Science
Ontologies are us: A unified model of social networks and semantics
Web Semantics: Science, Services and Agents on the World Wide Web
Comparing clusterings---an information based distance
Journal of Multivariate Analysis
Meme-tracking and the dynamics of the news cycle
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning similarity metrics for event identification in social media
Proceedings of the third ACM international conference on Web search and data mining
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
TwitterMonitor: trend detection over the twitter stream
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Emerging topic detection on Twitter based on temporal and social terms evaluation
Proceedings of the Tenth International Workshop on Multimedia Data Mining
Multi-prototype vector-space models of word meaning
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Everyone's an influencer: quantifying influence on twitter
Proceedings of the fourth ACM international conference on Web search and data mining
Truthy: mapping the spread of astroturf in microblog streams
Proceedings of the 20th international conference companion on World wide web
Who says what to whom on twitter
Proceedings of the 20th international conference on World wide web
Empirical study of topic modeling in Twitter
Proceedings of the First Workshop on Social Media Analytics
Twitinfo: aggregating and visualizing microblogs for event exploration
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Visual memes in social media: tracking real-world news in YouTube videos
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Dynamical classes of collective attention in twitter
Proceedings of the 21st international conference on World Wide Web
Proceedings of the VLDB Endowment
Measuring word relatedness using heterogeneous vector space models
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data.