Random sampling in cut, flow, and network design problems
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Approximating s-t minimum cuts in Õ(n2) time
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast computation of low rank matrix approximations
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Discovering large dense subgraphs in massive graphs
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Classification in Networked Data: A Toolkit and a Univariate Case Study
The Journal of Machine Learning Research
Measurement and analysis of online social networks
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Weighted Graph Cuts without Eigenvectors A Multilevel Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
A scalable pattern mining approach to web graph compression with communities
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Statistical properties of community structure in large social and information networks
Proceedings of the 17th international conference on World Wide Web
Graph sparsification by effective resistances
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Efficient semi-streaming algorithms for local triangle counting in massive graphs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Keyword search on external memory data graphs
Proceedings of the VLDB Endowment
Scalable graph clustering using stochastic flows: applications to community discovery
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Power-Law Distributions in Empirical Data
SIAM Review
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Proceedings of the 19th international conference on World wide web
Symmetrizations for clustering directed graphs
Proceedings of the 14th International Conference on Extending Database Technology
Reducing large internet topologies for faster simulations
NETWORKING'05 Proceedings of the 4th IFIP-TC6 international conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communication Systems
A fast random sampling algorithm for sparsifying matrices
APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Bayesian locality sensitive hashing for fast similarity search
Proceedings of the VLDB Endowment
Proceedings of the sixth ACM international conference on Web search and data mining
Sparsification and sampling of networks for collective classification
SBP'13 Proceedings of the 6th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
Efficient community detection in large networks using content and links
Proceedings of the 22nd international conference on World Wide Web
Accurate and scalable nearest neighbors in large networks based on effective importance
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Identifying protein complexes in AP-MS data with negative evidence via soft Markov clustering
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Decompositions of triangle-dense graphs
Proceedings of the 5th conference on Innovations in theoretical computer science
Ranking mechanisms for interaction networks
Proceedings of the 17th International Conference on Management of Data
Hi-index | 0.00 |
In this paper we look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges that are likely to be part of the same cluster. We propose to rank edges using a simple similarity-based heuristic that we efficiently compute by comparing the minhash signatures of the nodes incident to the edge. For each node, we select the top few edges to be retained in the sparsified graph. Extensive empirical results on several real networks and using four state-of-the-art graph clustering and community discovery algorithms reveal that our proposed approach realizes excellent speedups (often in the range 10-50), with little or no deterioration in the quality of the resulting clusters. In fact, for at least two of the four clustering algorithms, our sparsification consistently enables higher clustering accuracies.