Event identification in web social media through named entity recognition and topic modeling

  • Authors:
  • Konstantinos N. Vavliakis;Andreas L. Symeonidis;Pericles A. Mitkas

  • Affiliations:
  • -;-;-

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of identifying important online or real life events from large textual document streams that are freely available on the World Wide Web is increasingly gaining popularity, given the flourishing of the social web. An event triggers discussion and comments on the WWW, especially in the blogosphere and in microblogging services. Consequently, one should be able to identify the involved entities, topics, time, and location of events through the analysis of information publicly available on the web, create semantically rich representations of events, and then use this information to provide interesting results, or summarize news to users. In this paper, we define the concept of important event and propose an efficient methodology for performing event detection from large time-stamped web document streams. The methodology successfully integrates named entity recognition, dynamic topic map discovery, topic clustering, and peak detection techniques. In addition, we propose an efficient algorithm for detecting all important events from a document stream. We perform extensive evaluation of the proposed methodology and algorithm on a dataset of 7million blogposts, as well as through an international social event detection challenge. The results provide evidence that our approach: a) accurately detects important events, b) creates semantically rich representations of the detected events, c) can be adequately parameterized to correspond to different social perceptions of the event concept, and d) is suitable for online event detection on very large datasets. The expected complexity of the online facet of the proposed algorithm is linear with respect to the number of documents in the data stream.