Algorithms for clustering data
Algorithms for clustering data
In situ generation of compressed inverted files
Journal of the American Society for Information Science
Simulation of compressible flow on a massively parallel architecture
Scientific Programming - On applications analysis
Compressed inverted files with reduced decoding overheads
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Inverted file compression through document identifier reassignment
Information Processing and Management: an International Journal
Index Compression through Document Reordering
DCC '02 Proceedings of the Data Compression Conference
The text retrieval conferences (TRECS)
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Memory expansion technology (MXT): competitive impact
IBM Journal of Research and Development
Assigning identifiers to documents to enhance the clustering property of fulltext indexes
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 18th international conference on World wide web
On compressing social networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Compressing term positions in web indexes
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Inverted index compression via online document routing
Proceedings of the 20th international conference on World wide web
Proceedings of the VLDB Endowment
Efficient query evaluation through access-reordering
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
Granting efficient accesses to the index is a key issue for the performances of Web Search Engines (WSE). In order to enhance memory utilization and favor fast query resolution, WSEs use Inverted File (IF) indexes where the posting lists are stored as sequences of d_gaps (i.e. differences among successive document identifiers) compressed using variable length encoding methods. This paper describes the use of a lightweight clustering algorithm aimed at assigning the identifiers to documents in a way that minimizes the average values of d_gaps. The simulations performed on a real dataset, i.e. the Google contest collection, show that our approach allows to obtain an IF index which is, depending on the d_gap encoding chosen, up to 23% smaller than the one built over randomly assigned document identifiers. Moreover, we will show, both analytically and empirically, that the complexity of our algorithm is linear in space and time.