Local graph sparsification for scalable clustering

Authors:
Venu Satuluri;Srinivasan Parthasarathy;Yiye Ruan
Affiliations:
The Ohio State University, Columbus, OH, USA;The Ohio State University, Columbus, OH, USA;The Ohio State University, Columbus, OH, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 25
Cited 8

Random sampling in cut, flow, and network design problems

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Approximating s-t minimum cuts in Õ(n2) time

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Min-wise independent permutations (extended abstract)

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast computation of low rank matrix approximations

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Sampling from large graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Classification in Networked Data: A Toolkit and a Univariate Case Study

The Journal of Machine Learning Research
Measurement and analysis of online social networks

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Weighted Graph Cuts without Eigenvectors A Multilevel Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
A scalable pattern mining approach to web graph compression with communities

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Statistical properties of community structure in large social and information networks

Proceedings of the 17th international conference on World Wide Web
Graph sparsification by effective resistances

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Efficient semi-streaming algorithms for local triangle counting in massive graphs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Keyword search on external memory data graphs

Proceedings of the VLDB Endowment
Scalable graph clustering using stochastic flows: applications to community discovery

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Power-Law Distributions in Empirical Data

SIAM Review
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Sampling community structure

Proceedings of the 19th international conference on World wide web
Symmetrizations for clustering directed graphs

Proceedings of the 14th International Conference on Extending Database Technology
Reducing large internet topologies for faster simulations

NETWORKING'05 Proceedings of the 4th IFIP-TC6 international conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communication Systems
A fast random sampling algorithm for sparsifying matrices

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation

Bayesian locality sensitive hashing for fast similarity search

Proceedings of the VLDB Endowment
Sharding social networks

Proceedings of the sixth ACM international conference on Web search and data mining
Sparsification and sampling of networks for collective classification

SBP'13 Proceedings of the 6th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
Efficient community detection in large networks using content and links

Proceedings of the 22nd international conference on World Wide Web
Accurate and scalable nearest neighbors in large networks based on effective importance

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Identifying protein complexes in AP-MS data with negative evidence via soft Markov clustering

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Decompositions of triangle-dense graphs

Proceedings of the 5th conference on Innovations in theoretical computer science
Ranking mechanisms for interaction networks

Proceedings of the 17th International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges that are likely to be part of the same cluster. We propose to rank edges using a simple similarity-based heuristic that we efficiently compute by comparing the minhash signatures of the nodes incident to the edge. For each node, we select the top few edges to be retained in the sparsified graph. Extensive empirical results on several real networks and using four state-of-the-art graph clustering and community discovery algorithms reveal that our proposed approach realizes excellent speedups (often in the range 10-50), with little or no deterioration in the quality of the resulting clusters. In fact, for at least two of the four clustering algorithms, our sparsification consistently enables higher clustering accuracies.