Improving retrievability with improved cluster-based pseudo-relevance feedback selection

Authors:
Shariq Bashir
Affiliations:
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Austria
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 26
Cited 0

Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automated categorization in the international patent classification

ACM SIGIR Forum
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Flexible pseudo-relevance feedback via selective sampling

ACM Transactions on Asian Language Information Processing (TALIP)
Proposal of two-stage patent retrieval method considering the claim structure

ACM Transactions on Asian Language Information Processing (TALIP)
Using controlled query generation to evaluate blind relevance feedback algorithms

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Applying Data Mining to Pseudo-Relevance Feedback for High Performance Text Retrieval

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Introduction to the special issue on patent processing

Information Processing and Management: an International Journal
Estimation and use of uncertainty in pseudo-relevance feedback

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A new approach for evaluating query expansion: query-document term mismatch

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A cluster-based resampling method for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Selecting good expansion terms for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Retrievability: an evaluation measure for higher order information access tasks

Proceedings of the 17th ACM conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias

Proceedings of the 17th ACM conference on Information and knowledge management
TREC-CHEM: large scale chemical information retrieval evaluation at TREC

ACM SIGIR Forum
Effective pre-retrieval query performance prediction using similarity and variability evidence

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Robust query-specific pseudo feedback document selection for query expansion

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
PRES: a score metric for evaluating recall-oriented information retrieval applications

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Improving retrievability and recall by automatic corpus partitioning

Transactions on large-scale data- and knowledge-centered systems II
On the relationship between query characteristics and IR functions retrieval bias

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	12.05

Visualization

Abstract

High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. Findability is hindered by two aspects, namely the inherent bias favoring some types of documents over others introduced by the retrieval model, and the failure to correctly capture and interpret the context of conventionally rather short queries. In this paper, we analyze the bias impact of different retrieval models and query expansion strategies. We furthermore propose a novel query expansion strategy based on document clustering to identify dominant relevant documents. This helps to overcome limitations of conventional query expansion strategies that suffer strongly from the noise introduced by imperfect initial query results for pseudo-relevance feedback documents selection. Experiments with different collections of patent documents suggest that clustering based document selection for pseudo-relevance feedback is an effective approach for increasing the findability of individual documents and decreasing the bias of a retrieval system.