Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Entry Pairing in Inverted File
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Inverted index compression via online document routing
Proceedings of the 20th international conference on World wide web
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Proceedings of the sixth ACM international conference on Web search and data mining
Bitlist: new full-text index for low space cost and efficient keyword search
Proceedings of the VLDB Endowment
Using rating matrix compression techniques to speed up collaborative recommendations
Information Retrieval
Hi-index | 0.00 |
Recent studies demonstrated that it is possible to reduce Inverted Files (IF) sizes by reassigning the document identifiers of the original collection, as this lowers the distance between the positions of documents related to a single term. Variable-bit encoding schemes can exploit the average gap reduction and decrease the total amount of bits per document pointer. This paper presents an efficient solution to the reassignment problem, which consists in reducing the input data dimensionality using a SVD transformation, as well as considering it a Travelling Salesman Problem (TSP). We also present some efficient solutions based on clustering. Finally, we combine both the TSP and the clustering strategies for reordering the document identifiers. We present experimental tests and performance results in two text TREC collections, obtaining good compression ratios with low running times, and advance the possibility of obtaining scalable solutions for web collections based on the techniques presented here.