Applied numerical linear algebra
Applied numerical linear algebra
PLAPACK: parallel linear algebra package design overview
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Google's MapReduce programming model – Revisited
Science of Computer Programming
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Scalable Tensor Decompositions for Multi-aspect Data Mining
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
DOULION: counting triangles in massive graphs with a coin
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
An implementation of parallel eigenvalue computation using dual-level hybrid parallelism
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Mining large graphs: Algorithms, inference, and discoveries
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
EigenSpokes: surprising patterns and scalable community chipping in large graphs
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
On the duality of data-intensive file system design: reconciling HDFS and PVFS
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Managing and mining large graphs: patterns and algorithms
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
OPAvion: mining and visualization in large graphs
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Thought leaders during crises in massive social networks
Statistical Analysis and Data Mining
Parallel and I/O efficient set covering algorithms
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Using R for iterative and incremental processing
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
MapReduce algorithms for big data analysis
Proceedings of the VLDB Endowment
Presto: distributed machine learning and graph processing with sparse matrices
Proceedings of the 8th ACM European Conference on Computer Systems
Big graph mining: algorithms and discoveries
ACM SIGKDD Explorations Newsletter
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Given a graph with billions of nodes and edges, how can we find patterns and anomalies? Are there nodes that participate in too many or too few triangles? Are there close-knit near-cliques? These questions are expensive to answer unless we have the first several eigenvalues and eigenvectors of the graph adjacency matrix. However, eigensolvers suffer from subtle problems (e.g., convergence) for large sparse matrices, let alone for billion-scale ones. We address this problem with the proposed HEIGEN algorithm, which we carefully design to be accurate, efficient, and able to run on the highly scalable MAPREDUCE (HADOOP) environment. This enables HEIGEN to handle matrices more than 1000× larger than those which can be analyzed by existing algorithms. We implement HEIGEN and run it on the M45 cluster, one of the top 50 supercomputers in the world. We report important discoveries about near-cliques and triangles on several real-world graphs, including a snapshot of the Twitter social network (38Gb, 2 billion edges) and the "YahooWeb" dataset, one of the largest publicly available graphs (120Gb, 1.4 billion nodes, 6.6 billion edges).