Modeling word occurrences for the compression of concordances
ACM Transactions on Information Systems (TOIS)
Managing Gigabytes: Compressing and Indexing Documents and Images
Managing Gigabytes: Compressing and Indexing Documents and Images
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Inverted file compression through document identifier reassignment
Information Processing and Management: an International Journal
Implementation of the SMART Information Retrieval System
Implementation of the SMART Information Retrieval System
The Link Database: Fast Access to Graphs of the Web
DCC '02 Proceedings of the Data Compression Conference
Index Compression through Document Reordering
DCC '02 Proceedings of the Data Compression Conference
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Assigning identifiers to documents to enhance the clustering property of fulltext indexes
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Simplified similarity scoring using term ranks
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic creation of literature abstracts
IBM Journal of Research and Development
Document identifier reassignment through dimensionality reduction
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Incremental cluster-based retrieval using compressed cluster-skipping inverted files
ACM Transactions on Information Systems (TOIS)
Site-based dynamic pruning for query processing in search engines
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
On compressing social networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Entry Pairing in Inverted File
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
On compressing the textual web
Proceedings of the third ACM international conference on Web search and data mining
ACM Transactions on Information Systems (TOIS)
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Colored range queries and document retrieval
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Inverted index compression via online document routing
Proceedings of the 20th international conference on World wide web
A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification
ACM Transactions on the Web (TWEB)
Faster temporal range queries over versioned text
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Effect of different docid orderings on dynamic pruning retrieval strategies
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Optimized top-k processing with global page scores on block-max indexes
Proceedings of the fifth ACM international conference on Web search and data mining
Reordering an index to speed query processing without loss of effectiveness
Proceedings of the Seventeenth Australasian Document Computing Symposium
Optimizing top-k document retrieval strategies for block-max indexes
Proceedings of the sixth ACM international conference on Web search and data mining
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A candidate filtering mechanism for fast top-k query processing on modern cpus
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Document vector representations for feature extraction in multi-stage document ranking
Information Retrieval
Using rating matrix compression techniques to speed up collaborative recommendations
Information Retrieval
Hi-index | 0.00 |
The compression of Inverted File indexes in Web Search Engines has received a lot of attention in these last years. Compressing the index not only reduces space occupancy but also improves the overall retrieval performance since it allows a better exploitation of the memory hierarchy. In this paper we are going to empirically show that in the case of collections of Web Documents we can enhance the performance of compression algorithms by simply assigning identifiers to documents according to the lexicographical ordering of the URLs. We will validate this assumption by comparing several assignment techniques and several compression algorithms on a quite large document collection composed by about six million documents. The results are very encouraging since we can improve the compression ratio up to 40% using an algorithm that takes about ninety seconds to finish using only 100 MB of main memory.