Colibri: fast mining of large static and dynamic graphs

Authors:
Hanghang Tong;Spiros Papadimitriou;Jimeng Sun;Philip S. Yu;Christos Faloutsos
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;IBM T.J. Watson, Hawthorne, NY, USA;IBM T.J. Watson, Hawthorne, NY, USA;University of Illinois at Chicago, Chicago, IL, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 22
Cited 14

Erratum: inverting a sum of matrices

SIAM Review
Dimensionality reduction for similarity searching in dynamic databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Self-Organization and Identification of Web Communities

Computer
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Maximizing the spread of influence through a social network

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Correlating synchronous and asynchronous data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic multimedia cross-modal correlation discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Graphs over time: densification laws, shrinking diameters and possible explanations

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Streaming pattern discovery in multiple time-series

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Neighborhood Formation and Anomaly Detection in Bipartite Graphs

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication

SIAM Journal on Computing
Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition

SIAM Journal on Computing
Group formation in large social networks: membership, growth, and evolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond streams and graphs: dynamic tensor analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast computation of low-rank matrix approximations

Journal of the ACM (JACM)
Evolutionary spectral clustering by incorporating temporal smoothness

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
GraphScope: parameter-free mining of large time-evolving graphs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Random walk with restart: fast solutions and applications

Knowledge and Information Systems

Learning patterns in the dynamics of biological networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
JCCM: Joint Cluster Communities on Attribute and Relationship Data in Social Networks

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
A model for automatic generation of multi-partite graphs from arbitrary data

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Correlating financial time series with micro-blogging activity

Proceedings of the fifth ACM international conference on Web search and data mining
Randomized Algorithms for Matrices and Data

Foundations and Trends® in Machine Learning
Non-negative residual matrix factorization: problem definition, fast solutions, and applications

Statistical Analysis and Data Mining
Multi-level Low-rank Approximation-based Spectral Clustering for image segmentation

Pattern Recognition Letters
MultiAspectForensics: mining large heterogeneous networks using tensor

International Journal of Web Engineering and Technology
Sparse functional representation for large-scale service clustering

ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
A regularized graph layout framework for dynamic network visualization

Data Mining and Knowledge Discovery
Dynamix: anonymity on dynamic social structures

Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Mining most frequently changing component in evolving graphs

World Wide Web
Discovering descriptive rules in relational dynamic graphs

Intelligent Data Analysis - Dynamic Networks and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Low-rank approximations of the adjacency matrix of a graph are essential in finding patterns (such as communities) and detecting anomalies. Additionally, it is desirable to track the low-rank structure as the graph evolves over time, efficiently and within limited storage. Real graphs typically have thousands or millions of nodes, but are usually very sparse. However, standard decompositions such as SVD do not preserve sparsity. This has led to the development of methods such as CUR and CMD, which seek a non-orthogonal basis by sampling the columns and/or rows of the sparse matrix. However, these approaches will typically produce overcomplete bases, which wastes both space and time. In this paper we propose the family of Colibri methods to deal with these challenges. Our version for static graphs, Colibri-S, iteratively finds a non-redundant basis and we prove that it has no loss of accuracy compared to the best competitors (CUR and CMD), while achieving significant savings in space and time: on real data, Colibri-S requires much less space and is orders of magnitude faster (in proportion to the square of the number of non-redundant columns). Additionally, we propose an efficient update algorithm for dynamic, time-evolving graphs, Colibri-D. Our evaluation on a large, real network traffic dataset shows that Colibri-D is over 100 times faster than the best published competitor (CMD).