Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting reading difficulty with statistical language models
Journal of the American Society for Information Science and Technology
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
The effect of assessor error on IR system evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A user study of relevance judgments for e-discovery
Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
Quality-biased ranking of web documents
Proceedings of the fourth ACM international conference on Web search and data mining
Characterizing web content, user interests, and search behavior by reading level and topic
Proceedings of the fifth ACM international conference on Web search and data mining
Effect of written instructions on assessor agreement
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Alternative assessor disagreement and retrieval depth
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The notion of relevance differs between assessors, thus giving rise to assessor disagreement. Although assessor disagreement has been frequently observed, the factors leading to disagreement are still an open problem. In this paper we study the relationship between assessor disagreement and various topic independent factors such as readability and cohesiveness. We build a logistic model using reading level and other simple document features to predict assessor disagreement and rank documents by decreasing probability of disagreement. We compare the predictive power of these document-level features with that of a meta-search feature that aggregates a document's ranking across multiple retrieval runs. Our features are shown to be on a par with the meta-search feature, without requiring a large and diverse set of retrieval runs to calculate. Surprisingly, however, we find that the reading level features are negatively correlated with disagreement, suggesting that they are detecting some other aspect of document content.