Evaluation of evaluation in information retrieval
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Analysis of a very large web search engine query log
ACM SIGIR Forum
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Philosophy of Information Retrieval Evaluation
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
ACM SIGIR Forum
Automatic ranking of retrieval systems in imperfect environments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Understanding user goals in web search
Proceedings of the 13th international conference on World Wide Web
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning user interaction models for predicting web search result preferences
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Bias and the limits of pooling for large collections
Information Retrieval
Using the wisdom of the crowds for keyword generation
Proceedings of the 17th international conference on World Wide Web
Are click-through data adequate for learning web search rankings?
Proceedings of the 17th ACM conference on Information and knowledge management
Automatic query type identification based on click through information
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Hi-index | 0.00 |
Relevance evaluation is an important topic in Web search engine research. Traditional evaluation methods resort to huge amount of human efforts which lead to an extremely time-consuming process in practice. With analysis on large scale user query logs and click-through data, we propose a performance evaluation method that fully automatically generates large scale Web search topics and answer sets under Cranfield framework. These query-to-answer pairs are directly utilized in relevance evaluation with several widely-adopted precision/recall-related retrieval performance metrics. Besides single search engine log analysis, we propose user behavior models on multiple search engines' click-through logs to reduce potential bias among different search engines. Experimental results show that the evaluation results are similar to those gained by traditional human annotation, and our method avoids the propensity and subjectivity of manual judgments by experts in traditional ways.