Sampling precision to depth 10000 at CLEF 2009

Authors:
Stephen Tomlinson
Affiliations:
Open Text Corporation, Ottawa, Ontario, Canada
Venue:
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Year:
2009

Citing 5
Cited 1

How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Sampling Precision to Depth 10000 at CLEF 2007

Advances in Multilingual and Multimodal Information Retrieval
Sampling precision to depth 10000 at CLEF 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
CLEF 2009 ad hoc track overview: TEL and Persian tasks

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Bulgarian and hungarian experiments with hummingbird SearchServerTM at CLEF 2005

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories

CLEF 2009 ad hoc track overview: TEL and Persian tasks

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments

Quantified Score

Hi-index	0.00

Visualization

Abstract

We conducted an experiment to test the completeness of the relevance judgments for the monolingual German, French, English and Persian (Farsi) information retrieval tasks of the Ad Hoc Track of the Cross-Language Evaluation Forum (CLEF) 2009. In the ad hoc retrieval tasks, the system was given 50 natural language queries, and the goal was to find all of the relevant documents (with high precision) in a particular document set. For each language, we submitted a sample of the first 10000 retrieved items to investigate the frequency of relevant items at deeper ranks than the official judging depth of 60 for German, French and English and 80 for Persian. The results suggest that, on average, the percentage of relevant items assessed was less than 62% for German, 27% for French, 35% for English and 22% for Persian.