Spatio-temporal characteristics of bursty words in Twitter streams

  • Authors:
  • Hamed Abdelhaq;Michael Gertz;Christian Sengstock

  • Affiliations:
  • Heidelberg University, Germany;Heidelberg University, Germany;Heidelberg University, Germany

  • Venue:
  • Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Social networking and microblogging services such as Twitter provide a continuous source of data from which useful information can be extracted. The detection and characterization of bursty words play an important role in processing such data, as bursty words might hint to events or trending topics of social importance upon which actions can be triggered. While there are several approaches to extract bursty words from the content of messages, there is only little work that deals with the dynamics of continuous streams of messages, in particular messages that are geo-tagged. In this paper, we present a framework to identify bursty words from Twitter text streams and to describe such words in terms of their spatio-temporal characteristics. Using a time-aware word usage baseline, a sliding window approach over incoming tweets is proposed to identify words that satisfy some burstiness threshold. For these words then a time-varying, spatial signature is determined, which primarily relies on geo-tagged tweets. In order to deal with the noise and the sparsity of geo-tagged tweets, we propose a novel graph-based regularization procedure that uses spatial cooccurrences of bursty words and allows for computing sound spatial signatures. We evaluate the functionality of our online processing framework using two real-world Twitter datasets. The results show that our framework can efficiently and reliably extract bursty words and describe their spatio-temporal evolution over time.