Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
ACM Transactions on Database Systems (TODS)
On the efficiency of best-match cluster searches
Information Processing and Management: an International Journal
Information Retrieval
Managing Gigabytes: Compressing and Indexing Documents and Images
Managing Gigabytes: Compressing and Indexing Documents and Images
Efficiency and effectiveness of query processing in cluster-based retrieval
Information Systems
Information Processing and Management: an International Journal
Optimization of restricted searches in web directories using hybrid data structures
ECIR'03 Proceedings of the 25th European conference on IR research
Large-scale cluster-based retrieval experiments on Turkish texts
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental cluster-based retrieval using compressed cluster-skipping inverted files
ACM Transactions on Information Systems (TOIS)
Efficient processing of category-restricted queries for web directories
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Cluster searching strategies for collaborative recommendation systems
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Information retrieval over clustered document collections has two successive stages: first identifying the best-clusters and then the best-documents in these clusters that are most similar to the user query. In this paper, we assume that an inverted file over the entire document collection is used for the latter stage. We propose and evaluate algorithms for within-cluster searches, i.e., to integrate the best-clusters with the best-documents to obtain the final output including the highest ranked documents only from the best-clusters. Our experiments on a TREC collection including 210,158 documents with several query sets show that an appropriately selected integration algorithm based on the query length and system resources can significantly improve the query evaluation efficiency.