Scaling IR-system evaluation using term relevance sets

Authors:
Einat Amitay;David Carmel;Ronny Lempel;Aya Soffer
Affiliations:
IBM Haifa Research Lab, Haifa, Israel;IBM Haifa Research Lab, Haifa, Israel;IBM Haifa Research Lab, Haifa, Israel;IBM Haifa Research Lab, Haifa, Israel
Venue:
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2004

Citing 17
Cited 14

Full text indexing based on lexical relations an application: software libraries

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
The significance of the Cranfield tests on index languages

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Results and challenges in Web search evaluation

WWW '99 Proceedings of the eighth international conference on World Wide Web
Automatic Subject Recognition in Scientific Papers: An Empirical Study

Journal of the ACM (JACM)
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic ranking of retrieval systems in imperfect environments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Methods for ranking information retrieval systems without relevance judgments

Proceedings of the 2003 ACM symposium on Applied computing
Using titles and category names from editor-driven taxonomies for automatic evaluation

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Summary of the SIGIR 2003 workshop on defining evaluation methodologies for terabyte-scale test collections

ACM SIGIR Forum

Sampling search-engine results

WWW '05 Proceedings of the 14th international conference on World Wide Web
Automatic ranking of information retrieval systems using data fusion

Information Processing and Management: an International Journal
Performance prediction of data fusion for information retrieval

Information Processing and Management: an International Journal
Improving high accuracy retrieval by eliminating the uneven correlation effect in data fusion

Journal of the American Society for Information Science and Technology
Emerging semantic communities in peer web search

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
On the robustness of relevance measures with incomplete judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An overview of Web search evaluation methods

Computers and Electrical Engineering
Evaluating score normalization methods in data fusion

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
IR system evaluation using nugget-based test collections

Proceedings of the fifth ACM international conference on Web search and data mining
A case for automatic system evaluation

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Exploring cost-effective approaches to human evaluation of search engine relevance

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Overview of WebCLEF 2006

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Constructing test collections by inferring document relevance via extracted relevant information

Proceedings of the 21st ACM international conference on Information and knowledge management
Increasing cheat robustness of crowdsourcing tasks

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an evaluation method based on Term Relevance Sets Trels that measures an IR system's quality by examining the content of the retrieved results rather than by looking for pre-specified relevant pages. Trels consist of a list of terms believed to be relevant for a particular query as well as a list of irrelevant terms. The proposed method does not involve any document relevance judgments, and as such is not adversely affected by changes to the underlying collection. Therefore, it can better scale to very large, dynamic collections such as the Web. Moreover, this method can evaluate a system's effectiveness on an updatable "live" collection, or on collections derived from different data sources. Our experiments show that the proposed method is very highly correlated with official TREC measures.