Sentence length bias in TREC novelty track judgements

  • Authors:
  • Lorena Leal Bando;Falk Scholer;Andrew Turpin

  • Affiliations:
  • RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;University of Melbourne, Melbourne, Australia

  • Venue:
  • Proceedings of the Seventeenth Australasian Document Computing Symposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Cranfield methodology for comparing document ranking systems has also been applied recently to comparing sentence ranking methods, which are used as pre-processors for summary generation methods. In particular, the TREC Novelty track data has been used to assess whether one sentence ranking system is better than another. This paper demonstrates that there is a strong bias in the Novelty track data for relevant sentences to also be longer sentences. Thus, systems that simply choose the longest sentences will often appear to perform better in terms of identifying "relevant" sentences than systems that use other methods. We demonstrate, by example, how this can lead to misleading conclusions about the comparative effectiveness of sentence ranking systems. We then demonstrate that if the Novelty track data is split into subcollections based on sentence length, comparing systems on each of the subcollections leads to conclusions that avoid the bias.