Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Subquadratic approximation algorithms for clustering problems in high dimensional spaces
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Exploiting clustering in inverted file Compression
DCC '96 Proceedings of the Conference on Data Compression
Modeling word occurrences for the compression of concordances
DCC '95 Proceedings of the Conference on Data Compression
Towards Compressing Web Graphs
DCC '01 Proceedings of the Data Compression Conference
Compact representations of separable graphs
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Assigning document identifiers to enhance compressibility of Web Search Engines indexes
Proceedings of the 2004 ACM symposium on Applied computing
Compact representations of ordered sets
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Assigning identifiers to documents to enhance the clustering property of fulltext indexes
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
Efficient search in large textual collections with redundancy
Proceedings of the 16th international conference on World Wide Web
Document Classification Based on the Topic Evaluation and Its Usage in Data Compression
WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Incremental cluster-based retrieval using compressed cluster-skipping inverted files
ACM Transactions on Information Systems (TOIS)
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
Proceedings of the 18th international conference on World wide web
On compressing social networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Compressing term positions in web indexes
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Entry Pairing in Inverted File
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An improved competitive algorithm for reordering buffer management
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Inverted index compression via online document routing
Proceedings of the 20th international conference on World wide web
Almost tight bounds for reordering buffer management
Proceedings of the forty-third annual ACM symposium on Theory of computing
Proceedings of the VLDB Endowment
Faster temporal range queries over versioned text
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Effect of different docid orderings on dynamic pruning retrieval strategies
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficient query evaluation through access-reordering
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
A software architecture for effective document identifier reassignment
EUROCAST'05 Proceedings of the 10th international conference on Computer Aided Systems Theory
Document identifier reassignment through dimensionality reduction
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Optimal online buffer scheduling for block devices
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
A bicriteria approximation for the reordering buffer problem
ESA'12 Proceedings of the 20th Annual European conference on Algorithms
Proceedings of the sixth ACM international conference on Web search and data mining
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Bitlist: new full-text index for low space cost and efficient keyword search
Proceedings of the VLDB Endowment
Using rating matrix compression techniques to speed up collaborative recommendations
Information Retrieval
Hi-index | 0.00 |
An important concern in the design of search engines is the construction of an inverted index. An inverted index, also called a concordance, contains a list of documents (or posting list) for every possible search term. These posting lists are usually compressed with difference coding. Difference coding yields the best compression when the lists to be coded have high locality. Coding methods have been designed to specifically take advantage of locality in inverted indices. Here, we describe an algorithm to permute the document numbers so as to create locality in an inverted index. This is done by clustering the documents. Our algorithm, when applied to the TREC ad hoc database (disks 4 and 5), improves the performance of the best difference coding algorithm we found by fourteen percent. The improvement increases as the size of the index increases, so we expect that greater improvements would be possible on larger datasets.