Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Inverted file compression through document identifier reassignment
Information Processing and Management: an International Journal
Implementation of the SMART Information Retrieval System
Implementation of the SMART Information Retrieval System
Index Compression through Document Reordering
DCC '02 Proceedings of the Data Compression Conference
Assigning document identifiers to enhance compressibility of Web Search Engines indexes
Proceedings of the 2004 ACM symposium on Applied computing
Index compression using fixed binary codewords
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Efficient search in large textual collections with redundancy
Proceedings of the 16th international conference on World Wide Web
Incremental cluster-based retrieval using compressed cluster-skipping inverted files
ACM Transactions on Information Systems (TOIS)
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
Compressing term positions in web indexes
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Entry Pairing in Inverted File
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Faster temporal range queries over versioned text
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficient query evaluation through access-reordering
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
A software architecture for effective document identifier reassignment
EUROCAST'05 Proceedings of the 10th international conference on Computer Aided Systems Theory
Document identifier reassignment through dimensionality reduction
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Using rating matrix compression techniques to speed up collaborative recommendations
Information Retrieval
Hi-index | 0.00 |
Web Search Engines provide a large-scale text document retrieval service by processing huge Inverted File indexes. Inverted File indexes allow fast query resolution and good memory utilization since their d-gaps representation can be effectively and efficiently compressed by using variable length encoding methods. This paper proposes and evaluates some algorithms aimed to find an assignment of the document identifiers which minimizes the average values of d-gaps, thus enhancing the effectiveness of traditional compression methods. We ran several tests over the Google contest collection in order to validate the techniques proposed. The experiments demonstrated the scalability and effectiveness of our algorithms. Using the proposed algorithms, we were able to sensibly improve (up to 20.81%) the compression ratios of several encoding schemes.