The connectivity server: fast access to linkage information on the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Inverted file compression through document identifier reassignment
Information Processing and Management: an International Journal
Index Compression through Document Reordering
DCC '02 Proceedings of the Data Compression Conference
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming)
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
Document identifier reassignment through dimensionality reduction
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Fast and Compact Web Graph Representations
ACM Transactions on the Web (TWEB)
Multiscale approach for the network compression-friendly ordering
Journal of Discrete Algorithms
Proceedings of the 20th international conference on World wide web
Practical representations for web and social graphs
Proceedings of the 20th ACM international conference on Information and knowledge management
Extended compact web graph representations
Algorithms and Applications
Compressed representation of web and social networks via dense subgraphs
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
Since the first investigations on web graph compression, it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The author of the LINK database [1], for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic (or coordinate-free ) orderings based on the rows of the adjacency matrix (lexicographic and Gray code); they concluded that URL ordering has many advantages in spite of a small penalty in compression. In this paper we approach this issue in a more systematic way, testing some old orderings and proposing some new ones. Our experiments are made in the WebGraph framework [2], and show that the compression technique and the structure of the graph can produce significantly different results. In particular, we show that for the transpose web graph URL ordering is significantly less effective, and that some new orderings combining host information and Gray/lexicographic orderings outperform all previous methods. In particular, in some large transposed graphs they yield the quite incredible compression rate of 1 bit per link.