Real time discussion retrieval from twitter

  • Authors:
  • Dmitrijs Milajevs;Gosse Bouma

  • Affiliations:
  • University of Groningen, Groningen, Netherlands;University of Groningen, Groningen, Netherlands

  • Venue:
  • Proceedings of the 22nd international conference on World Wide Web companion
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

While social media receive a lot of attention from the scientific community in general, there is little work on high recall retrieval of messages relevant to a discussion. Hash tag based search is widely used for data retrieval from social media. This work shows limitations of this approach, because the majority of the relevant messages do not even contain any hash tag, and unpredictable hash tags are used as the conversation evolves in time. To overcome these limitations, we propose an alternative retrieval method. Given an input stream of messages as an example of the discussion, our method extracts the most relevant words from it and queries the social network for more messages with these words. Our method filters messages that do not belong to the discussion using an LDA topic model. We demonstrate this concept on manually built collections of tweets about major sport and music events.