Where to start filtering redundancy?: a cluster-based approach

  • Authors:
  • Ronald T. Fernandez;Javier Parapar;David E. Losada;Alvaro Barreiro

  • Affiliations:
  • University of Santiago de Compostela, Santiago de Compostela, Spain;University of A Coruña, A Coruña, Spain;University of Santiago de Compostela, Santiago de Compostela, Spain;University of A Coruña, A Coruña, Spain

  • Venue:
  • Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Novelty detection is a difficult task, particularly at sentence level. Most of the approaches proposed in the past consist of re-ordering all sentences following their novelty scores. However, this re-ordering has usually little value. In fact, a naive baseline with no novelty detection capabilities yields often better performance than any state-of-the-art novelty detection mechanism. We argue here that this is because current methods initiate too early the novelty detection process. When few sentences have been seen, it is unlikely that the user is negatively affected by redundancy. Therefore, re-ordering the first sentences may be harmful in terms of performance. We propose here a query-dependent method based on cluster analysis to determine where we must start filtering redundancy.