ThemeCrowds: multiresolution summaries of twitter usage

  • Authors:
  • Daniel Archambault;Derek Greene;Pádraig Cunningham;Neil Hurley

  • Affiliations:
  • University College Dublin, Dublin, Ireland;University College Dublin, Dublin, Ireland;University College Dublin, Dublin, Ireland;University College Dublin, Dublin, Ireland

  • Venue:
  • Proceedings of the 3rd international workshop on Search and mining user-generated contents
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Users of social media sites, such as Twitter, rapidly generate large volumes of text content on a daily basis. Visual summaries are needed to understand what groups of people are saying collectively in this unstructured text data. Users will typically discuss a wide variety of topics, where the number of authors talking about a specific topic can quickly grow or diminish over time, and what the collective is saying about the subject can shift as a situation develops. In this paper, we present a technique that summarises what collections of Twitter users are saying about certain topics over time. As the correct resolution for inspecting the data is unknown in advance, the users are clustered hierarchically over a fixed time interval based on the similarity of their posts. The visualisation technique takes this data structure as its input. Given a topic, it finds the correct resolution of users at each time interval and provides tags to summarise what the collective is discussing. The technique is tested on a large microblogging corpus, consisting of millions of tweets and over a million users.