When will information retrieval be "good enough"?
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
ACM Transactions on Information Systems (TOIS)
The Pyramid Method: Incorporating human content selection variation in summarization evaluation
ACM Transactions on Speech and Language Processing (TSLP)
A new rank correlation coefficient for information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Including summaries in system evaluation
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Modeling Expected Utility of Multi-session Information Distillation
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Usage based effectiveness measures: monitoring application performance in information retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
A user behavior model for average precision and its generalization to graded judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Expected browsing utility for web search evaluation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Expected reading effort in focused retrieval evaluation
Information Retrieval
A comparative analysis of cascade measures for novelty and diversity
Proceedings of the fourth ACM international conference on Web search and data mining
System effectiveness, user models, and user utility: a conceptual framework for investigation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating diversified search results using per-intent graded relevance
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating multi-query sessions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Simulating simple user behavior for system effectiveness evaluation
Proceedings of the 20th ACM international conference on Information and knowledge management
Click the search button and be happy: evaluating direct and immediate information access
Proceedings of the 20th ACM international conference on Information and knowledge management
Intent-based diversification of web search results: metrics and algorithms
Information Retrieval
Multiple testing in statistical analysis of systems-based information retrieval experiments
ACM Transactions on Information Systems (TOIS)
Time-based calibration of effectiveness measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Time drives interaction: simulating sessions in diverse searching environments
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Evaluating aggregated search pages
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling user variance in time-biased gain
Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Constructing test collections by inferring document relevance via extracted relevant information
Proceedings of the 21st ACM international conference on Information and knowledge management
Diversified search evaluation: lessons from the NTCIR-9 INTENT task
Information Retrieval
Hi-index | 0.00 |
We introduce a general information access evaluation framework that can potentially handle summaries, ranked document lists and even multi query sessions seamlessly. Our framework first builds a trailtext which represents a concatenation of all the texts read by the user during a search session, and then computes an evaluation metric called U-measure over the trailtext. Instead of discounting the value of a retrieved piece of information based on ranks, U-measure discounts it based on its position within the trailtext. U-measure takes the document length into account just like Time-Biased Gain (TBG), and has the diminishing return property. It is therefore more realistic than rank-based metrics. Furthermore, it is arguably more flexible than TBG, as it is free from the linear traversal assumption (i.e., that the user scans the ranked list from top to bottom), and can handle information access tasks other than ad hoc retrieval. This paper demonstrates the validity and versatility of the U-measure framework. Our main conclusions are: (a) For ad hoc retrieval, U-measure is at least as reliable as TBG in terms of rank correlations with traditional metrics and discriminative power; (b) For diversified search, our diversity versions of U-measure are highly correlated with state-of-the-art diversity metrics; (c) For multi-query sessions, U-measure is highly correlated with Session nDCG; and (d) Unlike rank-based metrics such as DCG, U-measure can quantify the differences between linear and nonlinear traversals in sessions. We argue that our new framework is useful for understanding the user's search behaviour and for comparison across different information access styles (e.g. examining a direct answer vs. examining a ranked list of web pages).