A scalable pattern mining approach to web graph compression with communities

Authors:
Gregory Buehrer;Kumar Chellapilla
Affiliations:
The Ohio State University, Columbus, OH;Microsoft Live Labs, Redmond, WA
Venue:
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Year:
2008

Citing 24
Cited 24

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences
Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Self-Organization and Identification of Web Communities

Computer
Finding Interesting Associations without Support Pruning

IEEE Transactions on Knowledge and Data Engineering
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The Link Database: Fast Access to Graphs of the Web

DCC '02 Proceedings of the Data Compression Conference
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
On clusterings: Good, bad and spectral

Journal of the ACM (JACM)
UbiCrawler: a scalable fully distributed web crawler

Software—Practice & Experience
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Local Graph Partitioning using PageRank Vectors

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Extraction and classification of dense communities in the web

Proceedings of the 16th international conference on World Wide Web
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Speeding up algorithms on compressed web graphs

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Finding Dense Subgraphs with Size Bounds

WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
On compressing social networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
The scalable hyperlink store

Proceedings of the 20th ACM conference on Hypertext and hypermedia
On Finding Dense Subgraphs

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
A distributed placement service for graph-structured and tree-structured data

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
On compressing the textual web

Proceedings of the third ACM international conference on Web search and data mining
b-Bit minwise hashing

Proceedings of the 19th international conference on World wide web
Neighbor query friendly compression of social networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast and Compact Web Graph Representations

ACM Transactions on the Web (TWEB)
Data structures: time, I/Os, entropy, joules!

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
An algorithmic treatment of strong queries

Proceedings of the fourth ACM international conference on Web search and data mining
Local graph sparsification for scalable clustering

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Summarization Meets Visualization on Online Social Networks

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Optimizing K2 trees: A case for validating the maturity of network of practices

Computers & Mathematics with Applications
Densest subgraph in streaming and MapReduce

Proceedings of the VLDB Endowment
Triangle listing in massive networks

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Compressed representation of web and social networks via dense subgraphs

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
SWORD: scalable workload-aware data placement for transactional workloads

Proceedings of the 16th International Conference on Extending Database Technology
Extract and rank web communities

Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Semantically meaningful group detection within sub-communities of Twitter blogosphere: a topic oriented multi-objective clustering approach

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Compact representation of Web graphs with extended functionality

Information Systems
Tight and simple Web graph compression for forward and reverse neighbor queries

Discrete Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

A link server is a system designed to support efficient implementations of graph computations on the web graph. In this work, we present a compression scheme for the web graph specifically designed to accommodate community queries and other random access algorithms on link servers. We use a frequent pattern mining approach to extract meaningful connectivity formations. Our Virtual Node Miner achieves graph compression without sacrificing random access by generating virtual nodes from frequent itemsets in vertex adjacency lists. The mining phase guarantees scalability by bounding the pattern mining complexity to O(E log E). We facilitate global mining, relaxing the requirement for the graph to be sorted by URL, enabling discovery for both inter-domain as well as intra-domain patterns. As a consequence, the approach allows incremental graph updates. Further, it not only facilitates but can also expedite graph computations such as PageRank and local random walks by implementing them directly on the compressed graph. We demonstrate the effectiveness of the proposed approach on several publicly available large web graph data sets. Experimental results indicate that the proposed algorithm achieves a 10- to 15-fold compression on most real word web graph data sets