Exploiting topic tracking in real-time tweet streams

  • Authors:
  • Yihong Hong;Yue Fei;Jianwu Yang

  • Affiliations:
  • Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China

  • Venue:
  • Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microblogs such as Twitter have become an increasingly popular source of real-time information.Users tend to keep up-to-date with the developments of topics they are interested in. In this paper, we present an effective real-time tweets filtering system to exploit topic tracking in social media streams. We combine background corpus with foreground corpus to handle the cold start problem. Then we build the Content Model to describe the characteristics of tweets, in which we utilize the link information to expand tweets' content aiming at enriching the semantic information of tweets, and we also analyze the influence of tweet's quality measured by a group of well-defined symbols. Moreover, the Pseudo Relevance Feedback approach triggered by a fixed-width temporal sliding window is employed to adapt our system to the alteration of topics over time. Experimental results on Tweet11 corpus indicate that our system achieves good performance in both T11SU and F-0.5 metrics, and the proposed system has better performance than the best one of TREC2012 real-time filtering pilot task.