The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
Algorithms for clustering data
Algorithms for clustering data
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
ACM Transactions on Database Systems (TODS)
Information retrieval
Incremental clustering for dynamic information processing
ACM Transactions on Information Systems (TOIS)
On the efficiency of best-match cluster searches
Information Processing and Management: an International Journal
Document filtering for fast ranking
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Fast evaluation of structured queries for information retrieval
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
The efficiency of inverted index and cluster searches
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The cluster hypothesis revisited
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Managing Gigabytes: Compressing and Indexing Documents and Images
Managing Gigabytes: Compressing and Indexing Documents and Images
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Index Compression through Document Reordering
DCC '02 Proceedings of the Data Compression Conference
Document retrieval based on clustered files
Document retrieval based on clustered files
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Assigning identifiers to documents to enhance the clustering property of fulltext indexes
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Efficiency and effectiveness of query processing in cluster-based retrieval
Information Systems
Simplified similarity scoring using term ranks
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamic information and library processing
Dynamic information and library processing
Automatic ranking of information retrieval systems using data fusion
Information Processing and Management: an International Journal
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Information Processing and Management: an International Journal
Pruned query evaluation using pre-computed impacts
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The impact of caching on search engines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Large-scale cluster-based retrieval experiments on Turkish texts
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Optimization of restricted searches in web directories using hybrid data structures
ECIR'03 Proceedings of the 25th European conference on IR research
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
Algorithms for within-cluster searches using inverted files
ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
Space-Limited ranked query evaluation using adaptive pruning
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Site-based dynamic pruning for query processing in search engines
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
New event detection and topic tracking in Turkish
Journal of the American Society for Information Science and Technology
Efficient processing of category-restricted queries for web directories
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Faster temporal range queries over versioned text
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
ACM Transactions on Information Systems (TOIS)
Evaluating subtopic retrieval methods: Clustering versus diversification of search results
Information Processing and Management: an International Journal
Optimizing positional index structures for versioned document collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Cluster searching strategies for collaborative recommendation systems
Information Processing and Management: an International Journal
Hi-index | 0.00 |
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental-CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.