Introduction to algorithms
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Adding compression to a full-text retrieval system
Software—Practice & Experience
A survey of information retrieval and filtering methods
A survey of information retrieval and filtering methods
Text databases and information retrieval
ACM Computing Surveys (CSUR)
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Introduction to Algorithms: A Creative Approach
Introduction to Algorithms: A Creative Approach
Parallel computations in information retrieval
CONPAR '81 Proceedings of the Conference on Analysing Problem Classes and Programming for Parallel Computing
Assigning document identifiers to enhance compressibility of Web Search Engines indexes
Proceedings of the 2004 ACM symposium on Applied computing
Assigning identifiers to documents to enhance the clustering property of fulltext indexes
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
Proceedings of the 18th international conference on World wide web
On compressing social networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Compressing term positions in web indexes
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Entry Pairing in Inverted File
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Fast and Compact Web Graph Representations
ACM Transactions on the Web (TWEB)
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Inverted index compression via online document routing
Proceedings of the 20th international conference on World wide web
Proceedings of the VLDB Endowment
Faster temporal range queries over versioned text
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Optimized top-k processing with global page scores on block-max indexes
Proceedings of the fifth ACM international conference on Web search and data mining
A software architecture for effective document identifier reassignment
EUROCAST'05 Proceedings of the 10th international conference on Computer Aided Systems Theory
Document identifier reassignment through dimensionality reduction
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Bitlist: new full-text index for low space cost and efficient keyword search
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The inverted file is the most popular indexing mechanism for document search in an information retrieval system. Compressing an inverted file can greatly improve document search rate. Traditionally, the d-gap technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. To smoothen and reduce the gap values, we propose a document-identifier reassignment algorithm. This reassignment is based on a similarity factor between documents. We generate a reassignment order for all documents according to the similarity to reassign closer identifiers to the documents having closer relationships. Simulation results show that the average gap values of sample inverted files can be reduced by 30%, and the compression rate of d-gapped inverted file with prefix-free codes can be improved by 15%.