Creating a test collection for citation-based IR experiments

  • Authors:
  • Anna Ritchie;Simone Teufel;Stephen Robertson

  • Affiliations:
  • University of Cambridge, Cambridge, U.K.;University of Cambridge, Cambridge, U.K.;Microsoft Research Ltd, Cambridge, U.K.

  • Venue:
  • HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an approach to building a test collection of research papers. The approach is based on the Cranfield 2 tests but uses as its vehicle a current conference; research questions and relevance judgements of all cited papers are elicited from conference authors. The resultant test collection is different from TREC's in that it comprises scientific articles rather than newspaper text and, thus, allows for IR experiments that include citation information. The test collection currently consists of 170 queries with relevance judgements; the document collection is the ACL Anthology. We describe properties of our queries and relevance judgements, and demonstrate the use of the test collection in an experimental setup. One potentially problematic property of our collection is that queries have a low number of relevant documents; we discuss ways of alleviating this.