Identification of low/high retrievable patents using content-based features

Authors:
Shariq Bashir;Andreas Rauber
Affiliations:
Vienna University of Technology, Vienna, Austria;Vienna University of Technology, Vienna, Austria
Venue:
Proceedings of the 2nd international workshop on Patent information retrieval
Year:
2009

Citing 10
Cited 2

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Automated categorization in the international patent classification

ACM SIGIR Forum
Term distillation in patent retrieval

PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20
Using controlled query generation to evaluate blind relevance feedback algorithms

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Introduction to the special issue on patent processing

Information Processing and Management: an International Journal
A new approach for evaluating query expansion: query-document term mismatch

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Retrievability: an evaluation measure for higher order information access tasks

Proceedings of the 17th ACM conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias

Proceedings of the 17th ACM conference on Information and knowledge management

Improving retrievability and recall by automatic corpus partitioning

Transactions on large-scale data- and knowledge-centered systems II
Improving retrievability and recall by automatic corpus partitioning

Transactions on large-scale data- and knowledge-centered systems II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document retrievability is a measurement used in information retrieval for identifying the bias of retrieval systems. In order to measure system bias for a specific document collection, an exhaustive set of queries is processed, measuring the frequency with which each document is retrieved. For better understanding and handling system bias, we need to understand the characteristics of documents that influence retrievability, and ideally be able to identify documents with high and low retrievability in advance. For this purpose, we identify a number of content-based features, which can be used effectively to classify a corpus into documents with low and high retrievability w.r.t a specific retrieval system. Our experiments on patent collections show that these features can achieve more than 80% classification accuracy on different systems, and hint at the need to combine different retrieval systems for optimizing recall.