Early online identification of attention gathering items in social media

  • Authors:
  • Michael Mathioudakis;Nick Koudas;Peter Marbach

  • Affiliations:
  • University of Toronto, Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada

  • Venue:
  • Proceedings of the third ACM international conference on Web search and data mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.02

Visualization

Abstract

Activity in social media such as blogs, micro-blogs, social networks, etc is manifested via interaction that involves text, images, links and other information items. Naturally, some items attract more attention than others, expressed with large volumes of linking, commenting or tagging activity, to name a few examples. Moreover, high attention can be indicative of emerging events, breaking news or generally indicate information items of interest to a vast set of people. The numbers associated with digital social activity are astonishing: in excess of millions of blog posts, tweets and forums updates per day, millions of tags in photos, news articles or blogs. Being able to identify information items that gather much attention in such a real time information collective is a challenging task. In this paper, we consider the problem of early online identification of items that gather a lot of attention in social media. We model social media activity using ISIS, a stochastic model for Interacting Streaming Information Sources, that intuitively captures the concept of attention gathering information items. Given the challenge of the information overload characterizing digital social activity, we present sequential statistical tests that enable early identification of attention gathering items. This effectively reduces the set of items one has to monitor in real time in order to identify pieces of information attracting a lot of attention. Experiments on real data demonstrate the utility of our model, as well as the efficiency and effectiveness of the proposed sequential statistical tests.