An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Improving System Performance with Compressed Memory
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Inverted file compression through document identifier reassignment
Information Processing and Management: an International Journal
Index Compression through Document Reordering
DCC '02 Proceedings of the Data Compression Conference
Assigning identifiers to documents to enhance the clustering property of fulltext indexes
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Three-level caching for efficient query processing in large Web search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Inverted files for text search engines
ACM Computing Surveys (CSUR)
TSP and cluster-based solutions to the reassignment of document identifiers
Information Retrieval
Fast generation of result snippets in web search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The impact of caching on search engines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Heavy-tailed distributions and multi-keyword queries
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
Static pruning of terms in inverted files
ECIR'07 Proceedings of the 29th European conference on IR research
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
Linear time 1/2 -approximation algorithm for maximum weighted matching in general graphs
STACS'99 Proceedings of the 16th annual conference on Theoretical aspects of computer science
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
This paper proposes to exploit content and usage information to rearrange an inverted index for a full-text IR system. The idea is to merge the entries of two frequently co-occurring terms, either in the collection or in the answered queries, to form a single, paired, entry. Since postings common to paired terms are not replicated, the resulting index is more compact. In addition, queries containing terms that have been paired are answered faster since we can exploit the pre-computed posting intersection. In order to choose which terms have to be paired, we formulate the term pairing problem as a Maximum-Weight Matching Graph problem, and we evaluate in our scenario efficiency and efficacy of both an exact and a heuristic solution. We apply our technique: (i ) to compact a compressed inverted file built on an actual Web collection of documents, and (ii ) to increase capacity of an in-memory posting list. Experiments showed that in the first case our approach can improve the compression ratio of up to 7.7%, while we measured a saving from 12% up to 18% in the size of the posting cache.