Analyses of multiple evidence combination
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Database merging strategy based on logistic regression
Information Processing and Management: an International Journal
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance score normalization for metasearch
Proceedings of the tenth international conference on Information and knowledge management
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Regression Relevance Models for Data Fusion
DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Retrieval result presentation and evaluation
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Evaluating score normalization methods in data fusion
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Evaluation of system measures for incomplete relevance judgment in IR
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Hi-index | 0.00 |
In information retrieval systems and digital libraries, retrieval result evaluation is a very important aspect. Up to now, almost all commonly used metrics such as average precision and recall level precision are ranking based metrics. In this work, we investigate if it is a good option to use a score based method, the Euclidean distance, for retrieval evaluation. Two variations of it are discussed: one uses the linear model to estimate the relation between rank and relevance in resultant lists, and the other uses a more sophisticated cubic regression model for this. Our experiments with two groups of submitted results to TREC demonstrate that the introduced new metrics have strong correlation with ranking based metrics when we consider the average of all 50 queries. On the other hand, our experiments also show that one of the variations (the linear model) has better overall quality than all those ranking based metrics involved. Another surprising finding is that a commonly used metric, average precision, may not be as good as previously thought.