Query evaluation: strategies and optimizations
Information Processing and Management: an International Journal
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
An information-theoretic approach to automatic query expansion
ACM Transactions on Information Systems (TOIS)
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient phrase querying with an auxiliary index
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Web search efficiency via a locality based static pruning method
WWW '05 Proceedings of the 14th international conference on World Wide Web
Pruned query evaluation using pre-computed impacts
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An efficient computation of the multiple-bernoulli language model
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The impact of caching on search engines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Effective top-k computation in retrieving structured documents with term-proximity support
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Query-based partitioning of documents and indexes for information lifecycle management
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SpotSigs: robust and efficient near duplicate detection in large web collections
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Design trade-offs for search engine caching
ACM Transactions on the Web (TWEB)
TinyLex: static n-gram index pruning with perfect recall
Proceedings of the 17th ACM conference on Information and knowledge management
Document Compaction for Efficient Query Biased Snippet Generation
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A Practitioner's Guide for Static Index Pruning
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Entropy-Based Static Index Pruning
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Effective top-k computation with term-proximity support
Information Processing and Management: an International Journal
A probabilistic model for compact document topic representation
SMO'09 Proceedings of the 9th WSEAS international conference on Simulation, modelling and optimization
Independent informative subgraph mining for graph information retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting query views for static index pruning in web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
Probabilistic static pruning of inverted files
ACM Transactions on Information Systems (TOIS)
Revisiting globally sorted indexes for efficient document retrieval
Proceedings of the third ACM international conference on Web search and data mining
A statistical view of binned retrieval models
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Accessibility in information retrieval
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Efficient term proximity search with term-pair indexes
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Exploiting index pruning methods for clustering XML collections
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents
ACM Transactions on Information Systems (TOIS)
Within-document term-based index pruning with statistical hypothesis testing
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
SizeSpotSigs: an effective deduplicate algorithm considering the size of page content
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
ACM Transactions on Information Systems (TOIS)
High-performance processing of text queries with tunable pruned term and term pair indexes
ACM Transactions on Information Systems (TOIS)
Optimized top-k processing with global page scores on block-max indexes
Proceedings of the fifth ACM international conference on Web search and data mining
XML retrieval using pruned element-index files
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Index ordering by query-independent measures
Information Processing and Management: an International Journal
Information preservation in static index pruning
Proceedings of the 21st ACM international conference on Information and knowledge management
High performance query expansion using adaptive co-training
Information Processing and Management: an International Journal
An information-theoretic account of static index pruning
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Permutation indexing: fast approximate retrieval from large corpora
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A Fast Static Index Pruning Algorithm
Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Hi-index | 0.00 |
We present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a document-centric approach to decide whether a posting for a given term should remain in the index or not. The decision is made based on the term's contribution to the document's Kullback-Leibler divergence from the text collection's global language model. Our technique can be used to decrease the size of the index by over 90%, at only a minor decrease in retrieval effectiveness. It thus allows us to make the index small enough to fit entirely into the main memory of a single PC, even for large text collections containing millions of documents. This results in great efficiency gains, superior to those of earlier pruning methods, and an average response time around 20 ms on the GOV2 document collection.