Creating a test collection for citation-based IR experiments

Authors:
Anna Ritchie;Simone Teufel;Stephen Robertson
Affiliations:
University of Cambridge, Cambridge, U.K.;University of Cambridge, Cambridge, U.K.;Microsoft Research Ltd, Cambridge, U.K.
Venue:
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Year:
2006

Citing 8
Cited 9

Life, death, and lawfulness on the electronic frontier

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
The Cranfield tests on index language devices

Readings in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Comparing citation contexts for information retrieval

Proceedings of the 17th ACM conference on Information and knowledge management
Automatic classification of citation function

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
How to find better index terms through citations

CLIIR '06 Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?
An annotation scheme for citation function

SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
So many topics, so little time

ACM SIGIR Forum
Using terms from citations for IR: some first results

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Who should I cite: learning literature search models from citation behavior

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Creating a test collection: relevance judgements of cited & non-cited papers

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Towards an ACL anthology corpus with logical document structure: an overview of the ACL 2012 contributed task

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an approach to building a test collection of research papers. The approach is based on the Cranfield 2 tests but uses as its vehicle a current conference; research questions and relevance judgements of all cited papers are elicited from conference authors. The resultant test collection is different from TREC's in that it comprises scientific articles rather than newspaper text and, thus, allows for IR experiments that include citation information. The test collection currently consists of 170 queries with relevance judgements; the document collection is the ACL Anthology. We describe properties of our queries and relevance judgements, and demonstrate the use of the test collection in an experimental setup. One potentially problematic property of our collection is that queries have a low number of relevant documents; we discuss ways of alleviating this.