Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Term distillation in patent retrieval
PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
Using controlled query generation to evaluate blind relevance feedback algorithms
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Introduction to the special issue on patent processing
Information Processing and Management: an International Journal
A new approach for evaluating query expansion: query-document term mismatch
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Retrievability: an evaluation measure for higher order information access tasks
Proceedings of the 17th ACM conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias
Proceedings of the 17th ACM conference on Information and knowledge management
Improving retrievability and recall by automatic corpus partitioning
Transactions on large-scale data- and knowledge-centered systems II
Improving retrievability and recall by automatic corpus partitioning
Transactions on large-scale data- and knowledge-centered systems II
Hi-index | 0.00 |
Document retrievability is a measurement used in information retrieval for identifying the bias of retrieval systems. In order to measure system bias for a specific document collection, an exhaustive set of queries is processed, measuring the frequency with which each document is retrieved. For better understanding and handling system bias, we need to understand the characteristics of documents that influence retrievability, and ideally be able to identify documents with high and low retrievability in advance. For this purpose, we identify a number of content-based features, which can be used effectively to classify a corpus into documents with low and high retrievability w.r.t a specific retrieval system. Our experiments on patent collections show that these features can achieve more than 80% classification accuracy on different systems, and hint at the need to combine different retrieval systems for optimizing recall.