Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On Clustering Validation Techniques
Journal of Intelligent Information Systems
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Flexible pseudo-relevance feedback via selective sampling
ACM Transactions on Asian Language Information Processing (TALIP)
Proposal of two-stage patent retrieval method considering the claim structure
ACM Transactions on Asian Language Information Processing (TALIP)
Using controlled query generation to evaluate blind relevance feedback algorithms
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Applying Data Mining to Pseudo-Relevance Feedback for High Performance Text Retrieval
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Introduction to the special issue on patent processing
Information Processing and Management: an International Journal
Estimation and use of uncertainty in pseudo-relevance feedback
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A new approach for evaluating query expansion: query-document term mismatch
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A cluster-based resampling method for pseudo-relevance feedback
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Selecting good expansion terms for pseudo-relevance feedback
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Retrievability: an evaluation measure for higher order information access tasks
Proceedings of the 17th ACM conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias
Proceedings of the 17th ACM conference on Information and knowledge management
Effective pre-retrieval query performance prediction using similarity and variability evidence
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Robust query-specific pseudo feedback document selection for query expansion
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
PRES: a score metric for evaluating recall-oriented information retrieval applications
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Improving retrievability and recall by automatic corpus partitioning
Transactions on large-scale data- and knowledge-centered systems II
On the relationship between query characteristics and IR functions retrieval bias
Journal of the American Society for Information Science and Technology
Hi-index | 12.05 |
High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. Findability is hindered by two aspects, namely the inherent bias favoring some types of documents over others introduced by the retrieval model, and the failure to correctly capture and interpret the context of conventionally rather short queries. In this paper, we analyze the bias impact of different retrieval models and query expansion strategies. We furthermore propose a novel query expansion strategy based on document clustering to identify dominant relevant documents. This helps to overcome limitations of conventional query expansion strategies that suffer strongly from the noise introduced by imperfect initial query results for pseudo-relevance feedback documents selection. Experiments with different collections of patent documents suggest that clustering based document selection for pseudo-relevance feedback is an effective approach for increasing the findability of individual documents and decreasing the bias of a retrieval system.