Applied numerical linear algebra
Applied numerical linear algebra
CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Multilevel k-way hypergraph partitioning
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
On power-law relationships of the Internet topology
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Automatic multimedia cross-modal correlation discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Graphs over time: densification laws, shrinking diameters and possible explanations
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Fast Random Walk with Restart and Its Applications
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Statistical properties of community structure in large social and information networks
Proceedings of the 17th international conference on World Wide Web
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scalable Tensor Decompositions for Multi-aspect Data Mining
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
On compressing social networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
BBM: bayesian browsing model from petabyte-scale data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Networks, Crowds, and Markets: Reasoning About a Highly Connected World
Networks, Crowds, and Markets: Reasoning About a Highly Connected World
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data warehousing and analytics infrastructure at facebook
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
HADI: Mining Radii of Large Graphs
ACM Transactions on Knowledge Discovery from Data (TKDD)
Patterns on the Connected Components of Terabyte-Scale Graphs
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
SystemML: Declarative machine learning on MapReduce
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Large-scale matrix factorization with distributed stochastic gradient descent
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Human mobility, social ties, and link prediction
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Spectral analysis for billion-scale graphs: discoveries and implementation
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
EigenSpokes: surprising patterns and scalable community chipping in large graphs
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
gbase: an efficient analysis platform for large graphs
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
How do we find patterns and anomalies in very large graphs with billions of nodes and edges? How to mine such big graphs efficiently? Big graphs are everywhere, ranging from social networks and mobile call networks to biological networks and the World Wide Web. Mining big graphs leads to many interesting applications including cyber security, fraud detection, Web search, recommendation, and many more. In this paper we describe Pegasus, a big graph mining system built on top of MapReduce, a modern distributed data processing platform. We introduce GIM-V, an important primitive that Pegasus uses for its algorithms to analyze structures of large graphs. We also introduce HEigen, a large scale eigensolver which is also a part of Pegasus. Both GIM-V and HEigen are highly optimized, achieving linear scale up on the number of machines and edges, and providing 9.2x and 76x faster performance than their naive counterparts, respectively. Using Pegasus, we analyze very large, real world graphs with billions of nodes and edges. Our findings include anomalous spikes in the connected component size distribution, the 7 degrees of separation in a Web graph, and anomalous adult advertisers in the who-follows-whom Twitter social network.