Overview of the first TREC conference
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Web search behavior of Internet experts and newbies
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Changes of search terms and tactics while writing a research proposal A longitudinal case study
Information Processing and Management: an International Journal
The effects of domain knowledge on search tactic formulation
Journal of the American Society for Information Science and Technology
Modeling successful performance in Web searching
Journal of the American Society for Information Science and Technology
Knowledge in the head and on the web: using topic expertise to aid search
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Intentional query suggestion: making user goals more explicit during search
Proceedings of the 2009 workshop on Web Search Click Data
The effect of assessor error on IR system evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Design and implementation of relevance assessments using crowdsourcing
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
An analysis of systematic judging errors in information retrieval
Proceedings of the 21st ACM international conference on Information and knowledge management
Differences in search engine evaluations between query owners and non-owners
Proceedings of the sixth ACM international conference on Web search and data mining
Implementing crowdsourcing-based relevance experimentation: an industrial perspective
Information Retrieval
Hi-index | 0.00 |
Traditional search evaluation approaches have often relied on domain experts to evaluate results for each query. Unfortunately, the range of topics present in any representative sample of web queries makes it impractical to have expert evaluators for every topic. In this paper, we investigate the effect of using "generalist" evaluators instead of experts in the domain of queries being evaluated. Empirically, we ind that for queries drawn from domains requiring high expertise, (1) generalists tend to give shallow, inaccurate ratings as compared to experts. (2) Further experiments show that generalists disagree on the underlying meaning of these queries significantly more often than experts, and often appear to "give up'' and fall back on surface features such as keyword matching. (3) Finally, by estimating the percentage of "expertise requiring'' queries in a web query sample, we estimate the impact of using generalists, versus the ideal of having domain experts for every "expertise requiring'' query.