Evaluation by highly relevant documents

Authors:
Ellen M. Voorhees
Affiliations:
National Institute of Standards and Technology, Gaithersburg, MD
Venue:
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2001

Citing 8
Cited 80

Measures of relative relevance and ranked half-life: performance indicators for interactive IR

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding information on the World Wide Web: the retrieval effectiveness of search engines

Information Processing and Management: an International Journal
Results and challenges in Web search evaluation

WWW '99 Proceedings of the eighth international conference on World Wide Web
Overview of the sixth text REtrieval conference (TREC-6)

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal

Analysis of lexical signatures for finding lost or related documents

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Liberal relevance criteria of TREC -: counting on negligible documents?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Evaluation of Text Retrieval Systems

Programming and Computing Software
Using graded relevance assessments in IR evaluation

Journal of the American Society for Information Science and Technology
Using manually-built web directories for automatic evaluation of known-item retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Result merging strategies for a current news metasearcher

Information Processing and Management: an International Journal
The concept of relevance in IR

Journal of the American Society for Information Science and Technology
Evaluating database selection algorithms for distributed search

Proceedings of the 2003 ACM symposium on Applied computing
Engineering a multi-purpose test collection for web retrieval experiments

Information Processing and Management: an International Journal
Using titles and category names from editor-driven taxonomies for automatic evaluation

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Measuring retrieval effectiveness: a new proposal and a first experimental validation

Journal of the American Society for Information Science and Technology
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Forming test collections with no system pooling

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Analysis of lexical signatures for improving information persistence on the World Wide Web

ACM Transactions on Information Systems (TOIS)
The influence of relevance levels on the effectiveness of interactive information retrieval

Journal of the American Society for Information Science and Technology
Evaluating implicit measures to improve web search

ACM Transactions on Information Systems (TOIS)
Evaluating the evaluation: a case study using the TREC 2002 question answering track

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Binary and graded relevance in IR evaluations: comparison of the effects on ranking of IR systems

Information Processing and Management: an International Journal
Incremental test collections

Proceedings of the 14th ACM international conference on Information and knowledge management
Hierarchical clustering of a Finnish newspaper article collection with graded relevance assessments

Information Retrieval
Building a reusable test collection for question answering

Journal of the American Society for Information Science and Technology - Research Articles
High accuracy retrieval with multiple nested ranker

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

ACM Transactions on Information Systems (TOIS)
Percent perfect performance (PPP)

Information Processing and Management: an International Journal
On the reliability of information retrieval metrics based on graded relevance

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
"What is a good digital library?" - A quality model for digital libraries

Information Processing and Management: an International Journal
An analysis of two approaches in information retrieval: From frameworks to study designs

Journal of the American Society for Information Science and Technology
Distributed web search efficiency by truncating results

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
On rank correlation in information retrieval evaluation

ACM SIGIR Forum
On the robustness of relevance measures with incomplete judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
How well does result relevance predict session satisfaction?

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Problems with Kendall's tau

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Semantic components enhance retrieval of domain-specific documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Evaluating epistemic uncertainty under incomplete assessments

Information Processing and Management: an International Journal
Extracting related named entities from blogosphere for event mining

Proceedings of the 2nd international conference on Ubiquitous information management and communication
Evaluating the effectiveness of relevance feedback based on a user simulation model: effects of a user scenario on cumulated gain value

Information Retrieval
On information retrieval metrics designed for evaluation with incomplete relevance assessments

Information Retrieval
A new rank correlation coefficient for information retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Intuition-supporting visualization of user's performance based on explicit negative higher-order relevance

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision when judgments are incomplete

Knowledge and Information Systems
CMIC at INEX 2007: Book Search Track

Focused Access to XML Documents
A Comparison of Interactive and Ad-Hoc Relevance Assessments

Focused Access to XML Documents
Book search: indexing the valuable parts

Proceedings of the 2008 ACM workshop on Research advances in large digital book repositories
An evolutionary approach for combining different sources of evidence in search engines

Information Systems
Psychiatric document retrieval using a discourse-aware model

Artificial Intelligence
Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Including summaries in system evaluation

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspective

Information Systems
Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspective

Information Systems
Building a framework for the probability ranking principle by a family of expected weighted rank

ACM Transactions on Information Systems (TOIS)
Empirical justification of the gain and discount function for nDCG

Proceedings of the 18th ACM conference on Information and knowledge management
Weighted Rank Correlation in Information Retrieval Evaluation

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems

Information Processing and Management: an International Journal
Popularity weighted ranking for academic digital libraries

ECIR'07 Proceedings of the 29th European conference on IR research
PaMS: A component-based service for finding the missing full text of articles cataloged in a digital library

Information Systems
Discounted cumulated gain based evaluation of multiple-query IR sessions

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Robust query-specific pseudo feedback document selection for query expansion

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Collecting high quality overlapping labels at low cost

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Extending average precision to graded relevance judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
An empirical analysis of ontology-based query expansion for learning resource searches using MERLOT and the Gene ontology

Knowledge-Based Systems
Examining the robustness of evaluation metrics for patent retrieval with incomplete relevance judgements

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Trust your social network according to satisfaction, reputation and privacy

Proceedings of the Third International Workshop on Reliability, Availability, and Security
Is a query worth translating: ask the users!

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Indexing and weighting of multilingual and mixed documents

Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment
User effect in evaluating personalized information retrieval systems

EC-TEL'06 Proceedings of the First European conference on Technology Enhanced Learning: innovative Approaches for Learning and Knowledge Sharing
INEX 2005 evaluation measures

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Evaluating scalability in information retrieval with multigraded relevance

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Dictionary-based CLIR loses highly relevant documents

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Contexts of relevance for information retrieval system design

CoLIS'05 Proceedings of the 5th international conference on Context: conceptions of Library and Information Sciences
Measures for benchmarking semantic web service matchmaking correctness

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
What evaluation criteria are right for CCBR? considering rank quality

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
The effects of relevance feedback quality and quantity in interactive relevance feedback: a simulation based on user modeling

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Top-k learning to rank: labeling, ranking and evaluation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An analysis of systematic judging errors in information retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
Identifying top news using crowdsourcing

Information Retrieval
On Using Fewer Topics in Information Retrieval Evaluations

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Evaluating question answering over linked data

Web Semantics: Science, Services and Agents on the World Wide Web
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given the size of the web, the search engine industry has argued that engines should be evaluated by their ability to retrieve highly relevant pages rather than all possible relevant pages. To explore the role highly relevant documents play in retrieval system evaluation, assessors for the \mbox{TREC-9} web track used a three-point relevance scale and also selected best pages for each topic. The relative effectiveness of runs evaluated by different relevant document sets differed, confirming the hypothesis that different retrieval techniques work better for retrieving highly relevant documents. Yet evaluating by highly relevant documents can be unstable since there are relatively few highly relevant documents. TREC assessors frequently disagreed in their selection of the best page, and subsequent evaluation by best page across different assessors varied widely. The discounted cumulative gain measure introduced by J\"{a}rvelin and Kek\"{a}l\"{a}inen increases evaluation stability by incorporating all relevance judgments while still giving precedence to highly relevant documents.