The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
SALSA: the stochastic approach for link-structure analysis
ACM Transactions on Information Systems (TOIS)
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Parameter free bursty events detection in text streams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Data association for topic intensity tracking
ICML '06 Proceedings of the 23rd international conference on Machine learning
Time-dependent event hierarchy construction
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Seeking stable clusters in the blogosphere
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Temporal and information flow based event detection from social text streams
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Communications of the ACM
Crowds, clouds, and algorithms: exploring the human side of "big data" applications
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Linking online news and social media
Proceedings of the fourth ACM international conference on Web search and data mining
EnBlogue: emergent topic detection in web 2.0 streams
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Coevolution of network structure and content
Proceedings of the 3rd Annual ACM Web Science Conference
Identifying event-related bursts via social media activities
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.02 |
Activity in social media such as blogs, micro-blogs, social networks, etc is manifested via interaction that involves text, images, links and other information items. Naturally, some items attract more attention than others, expressed with large volumes of linking, commenting or tagging activity, to name a few examples. Moreover, high attention can be indicative of emerging events, breaking news or generally indicate information items of interest to a vast set of people. The numbers associated with digital social activity are astonishing: in excess of millions of blog posts, tweets and forums updates per day, millions of tags in photos, news articles or blogs. Being able to identify information items that gather much attention in such a real time information collective is a challenging task. In this paper, we consider the problem of early online identification of items that gather a lot of attention in social media. We model social media activity using ISIS, a stochastic model for Interacting Streaming Information Sources, that intuitively captures the concept of attention gathering information items. Given the challenge of the information overload characterizing digital social activity, we present sequential statistical tests that enable early identification of attention gathering items. This effectively reduces the set of items one has to monitor in real time in order to identify pieces of information attracting a lot of attention. Experiments on real data demonstrate the utility of our model, as well as the efficiency and effectiveness of the proposed sequential statistical tests.