Variations in relevance assessments and the measurement of retrieval effectiveness
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
The Philosophy of Information Retrieval Evaluation
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
WWW '03 Proceedings of the 12th international conference on World Wide Web
Swoogle: a search and metadata engine for the semantic web
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Novelty detection: the TREC experience
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Sindice.com: a document-oriented lookup index for open linked data
International Journal of Metadata, Semantics and Ontologies
Crowdsourcing for relevance evaluation
ACM SIGIR Forum
SPARK: A Keyword Search Engine on Relational Databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Financial incentives and the "performance of crowds"
Proceedings of the ACM SIGKDD Workshop on Human Computation
Overview of the INEX 2008 Ad Hoc Track
Advances in Focused Retrieval
Language-model-based ranking for queries on RDF-graphs
Proceedings of the 18th ACM conference on Information and knowledge management
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Proceedings of the international conference on Multimedia information retrieval
Ad-hoc object retrieval in the web of data
Proceedings of the 19th international conference on World wide web
Crowdsourcing assessments for XML ranked retrieval
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Effective and efficient entity search in RDF data
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Relevance feedback between hypertext and Semantic Web search: Frameworks and evaluation
Web Semantics: Science, Services and Agents on the World Wide Web
Proceedings of the 21st international conference on World Wide Web
Query-Independent learning to rank for RDF entity search
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Combining inverted indices and structured search for ad-hoc object retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Evaluating web archive search systems
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Influence of timeline and named-entity components on user engagement
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Implementing crowdsourcing-based relevance experimentation: an industrial perspective
Information Retrieval
Pick-a-crowd: tell me what you like, and i'll tell you what to do
Proceedings of the 22nd international conference on World Wide Web
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Repeatable and reliable semantic search evaluation
Web Semantics: Science, Services and Agents on the World Wide Web
Large-scale linked data integration using probabilistic reasoning and crowdsourcing
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
The primary problem confronting any new kind of search task is how to boot-strap a reliable and repeatable evaluation campaign, and a crowd-sourcing approach provides many advantages. However, can these crowd-sourced evaluations be repeated over long periods of time in a reliable manner? To demonstrate, we investigate creating an evaluation campaign for the semantic search task of keyword-based ad-hoc object retrieval. In contrast to traditional search over web-pages, object search aims at the retrieval of information from factual assertions about real-world objects rather than searching over web-pages with textual descriptions. Using the first large-scale evaluation campaign that specifically targets the task of ad-hoc Web object retrieval over a number of deployed systems, we demonstrate that crowd-sourced evaluation campaigns can be repeated over time and still maintain reliable results. Furthermore, we show how these results are comparable to expert judges when ranking systems and that the results hold over different evaluation and relevance metrics. This work provides empirical support for scalable, reliable, and repeatable search system evaluation using crowdsourcing.