A re-examination of relevance: toward a dynamic, situational definition
Information Processing and Management: an International Journal
User-defined relevance criteria: an exploratory study
Journal of the American Society for Information Science - Special issue: relevance research
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the sixth text REtrieval conference (TREC-6)
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
A new unified probabilistic model
Journal of the American Society for Information Science and Technology
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval with a Hybrid Automatic Query Expansion and Data Fusion Procedure
Information Retrieval
Probabilistic information retrieval model for a dependency structured indexing system
Information Processing and Management: an International Journal
The maximum entropy method for analyzing retrieval measures
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A geometric interpretation and analysis of R-precision
Proceedings of the 14th ACM international conference on Information and knowledge management
Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems
Information Processing and Management: an International Journal
Retrieval result presentation and evaluation
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Using the euclidean distance for retrieval evaluation
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Hi-index | 0.00 |
Incomplete relevance judgment has become a norm for the evaluation of some major information retrieval evaluation events such as TREC, but its effect on some system measures has not been well understood. In this paper, we evaluate four system measures, namely mean average precision, R-precision, normalized average precision over all documents, and normalized discount cumulative gain, under incomplete relevance judgment. Among them, the measure of normalized average precision over all documents is introduced, and both mean average precision and R-precision are generalized for graded relevance judgment. These four measures have a common characteristic: complete relevance judgment is required for the calculation of their accurate values. We empirically investigate these measures through extensive experimentation of TREC data and aim to find the effect of incomplete relevance judgment on them. From these experiments, we conclude that incomplete relevance judgment affects all these four measures' values significantly. When using the pooling method in TREC, the more incomplete the relevance judgment is, the higher the values of all these measures usually become. We also conclude that mean average precision is the most sensitive but least reliable measure, normalized discount cumulative gain and normalized average precision over all documents are the most reliable but least sensitive measures, while R-precision is in the middle.