Two-tier similarity model for story link detection

  • Authors:
  • Tadashi Nomoto

  • Affiliations:
  • National Institute of Japanese Literature, Tachikawa, Japan

  • Venue:
  • CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents a novel approach to story link detection, where the goal is to determine whether a pair of news stories are linked, i.e., talk about the same event. The present work marks a departure from the prior work in that we measure similarity at two distinct levels of textual organization, the document and its collection, and combine scores at both levels to determine how well stories are linked. Experiments on the TDT-5 corpus show that the present approach, which we call a 'two-tier similarity model,' comfortably beats conventional approaches such as Clarity enhanced KL divergence, while performing robustly across diverse languages.