Sentence length bias in TREC novelty track judgements

Authors:
Lorena Leal Bando;Falk Scholer;Andrew Turpin
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;University of Melbourne, Melbourne, Australia
Venue:
Proceedings of the Seventeenth Australasian Document Computing Symposium
Year:
2012

Citing 20
Cited 1

Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic condensation of electronic publications by sentence selection

Information Processing and Management: an International Journal - Special issue: summarizing text
Document length normalization

Information Processing and Management: an International Journal - Special issue: history of information science
Advantages of query biased summaries in information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Automatically summarising Web sites: is there a way around it?

Proceedings of the ninth international conference on Information and knowledge management
Retrieval and novelty detection at the sentence level

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A task-oriented study on the influencing effects of query-biased summarisation in web searching

Information Processing and Management: an International Journal
Novelty detection based on sentence level patterns

Proceedings of the 14th ACM international conference on Information and knowledge management
A system for query-specific document summarization

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Fast generation of result snippets in web search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic summarising: The state of the art

Information Processing and Management: an International Journal
Learning query-biased web page summarization

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Analysis of long queries in a large scale search log

Proceedings of the 2009 workshop on Web Search Click Data
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Document Compaction for Efficient Query Biased Snippet Generation

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
The automatic creation of literature abstracts

IBM Journal of Research and Development
Statistical query expansion for sentence retrieval and its effects on weak and strong queries

Information Retrieval

The seventeenth australasian document computing symposium

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cranfield methodology for comparing document ranking systems has also been applied recently to comparing sentence ranking methods, which are used as pre-processors for summary generation methods. In particular, the TREC Novelty track data has been used to assess whether one sentence ranking system is better than another. This paper demonstrates that there is a strong bias in the Novelty track data for relevant sentences to also be longer sentences. Thus, systems that simply choose the longest sentences will often appear to perform better in terms of identifying "relevant" sentences than systems that use other methods. We demonstrate, by example, how this can lead to misleading conclusions about the comparative effectiveness of sentence ranking systems. We then demonstrate that if the Novelty track data is split into subcollections based on sentence length, comparing systems on each of the subcollections leads to conclusions that avoid the bias.