Tight and simple Web graph compression for forward and reverse neighbor queries

Authors:
Szymon Grabowski;Wojciech Bieniecki
Affiliations:
-;-
Venue:
Discrete Applied Mathematics
Year:
2014

Citing 16
Cited 0

The connectivity server: fast access to linkage information on the Web

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A Fast General Methodology for Information-Theoretically Optimal Encodings of Graphs

SIAM Journal on Computing
Succinct representation of balanced parentheses, static trees and planar graphs

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Succinct static data structures

Succinct static data structures
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
UbiCrawler: a scalable fully distributed web crawler

Software—Practice & Experience
Compressed full-text indexes

ACM Computing Surveys (CSUR)
A scalable pattern mining approach to web graph compression with communities

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Efficient Compression of Web Graphs

COCOON '08 Proceedings of the 14th annual international conference on Computing and Combinatorics
k2-Trees for Compact Web Graph Representation

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Local Modeling for WebGraph Compression

DCC '10 Proceedings of the 2010 Data Compression Conference
Fast and Compact Web Graph Representations

ACM Transactions on the Web (TWEB)
Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks

Proceedings of the 20th international conference on World wide web
Practical representations for web and social graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
Extended compact web graph representations

Algorithms and Applications
Enhanced byte codes with restricted prefix properties

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.04

Visualization

Abstract

Analyzing Web graphs has applications in determining page ranks, fighting Web spam, detecting communities and mirror sites, and more. This study is however hampered by the necessity of storing a major part of huge graphs in the external memory which prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus been presented, to represent Web graphs succinctly, but also providing random access. Those techniques are usually based on differential encodings of the adjacency lists, finding repeating nodes or node regions in the successive lists, more general grammar-based transformations or 2-dimensional representations of the binary matrix of the graph. In this paper we present three Web graph compression algorithms. The first can be seen as engineering of the Boldi and Vigna (2004) [8] method. We extend the notion of similarity between link lists and use a more compact encoding of residuals. The algorithm works on blocks of varying size (in the number of input lists) and sacrifices access time for better compression ratio, achieving more succinct graph representation than other algorithms reported in the literature. The second algorithm works on blocks of the same size in the number of input lists. Its key mechanism is merging the block into a single ordered list. This method achieves much more attractive space-time tradeoffs. Finally, we present an algorithm for bidirectional neighbor query support, which offers compression ratios better than those known from the literature.