Mining event temporal boundaries from news corpora through evolution phase discovery

  • Authors:
  • Liang Kong;Rui Yan;Han Jiang;Yan Zhang;Yan Gao;Li Fu

  • Affiliations:
  • Department of Machine Intelligence, Peking University and Key Laboratory on Machine Perception, Ministry of Education, Beijing, China;Department of Computer Science, Peking University, Beijing, China;Department of Computer Science, Peking University, Beijing, China;Department of Machine Intelligence, Peking University and Key Laboratory on Machine Perception, Ministry of Education, Beijing, China;Service Software Chongqing Institute of ZTE Corporation, China;Service Software Chongqing Institute of ZTE Corporation, China

  • Venue:
  • WAIM'11 Proceedings of the 12th international conference on Web-age information management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Currently news flood spreads throughout the web. The techniques of Event Detection and Tracking makes it feasible to gather and structure text information into events which are constructed online automatically and updated temporally. Users are usually eager to browse the whole event evolution. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event evolution phases discovery. We introduce a novel and principled model (called EPD), aiming at temporally outlining the entire news development. A news document is usually not atomic but consists of independent news segments related to the same event. Therefore we first employ a latent ingredients extraction method to extract event snippets. Unlike traditional clustering methods, we propose a novel metrics integrating content feature, temporal feature, distribution feature and bursty feature to measure the correlation between snippets along timeline in a specific event. Combined with bursty feature, we introduce a novel method to compute word weight. We employ HAC to group the news snippets into diversified phases. An optimization problem are utilized to decide the number of phases, which makes EPD applied. With our novel evaluation method, empirical experiments on two real datasets show that EPD is effective and outperforms various related algorithms. Automatic event chronicle generated is introduced as a typical application of EPD.