Bounds on the mean classification error rate of multiple experts
Pattern Recognition Letters
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning to rank from a noisy crowd
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Pushing the boundaries of crowd-enabled databases with query-driven schema expansion
Proceedings of the VLDB Endowment
Learning an accurate entity resolution model from crowdsourced labels
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
This paper studies quality of human labels used to train search engines' rankers. Our specific focus is performance improvements obtained by using overlapping relevance labels, which is by collecting multiple human judgments for each training sample. The paper explores whether, when, and for which samples one should obtain overlapping training labels, as well as how many labels per sample are needed. The proposed selective labeling scheme collects additional labels only for a subset of training samples, specifically for those that are labeled relevant by a judge. Our experiments show that this labeling scheme improves the NDCG of two Web search rankers on several real-world test sets, with a low labeling overhead of around 1.4 labels per sample. This labeling scheme also outperforms several methods of using overlapping labels, such as simple k-overlap, majority vote, the highest labels, etc. Finally, the paper presents a study of how many overlapping labels are needed to get the best improvement in retrieval accuracy.