Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Large graph processing in the cloud
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Design patterns for efficient graph algorithms in MapReduce
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
A Very Fast Method for Clustering Big Text Datasets
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
SGDB: simple graph database optimized for activation spreading computation
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
HADI: Mining Radii of Large Graphs
ACM Transactions on Knowledge Discovery from Data (TKDD)
Social content matching in MapReduce
Proceedings of the VLDB Endowment
Fast personalized PageRank on MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Towards efficient subgraph search in cloud computing environments
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Crunching large graphs with commodity processors
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Clustering very large multi-dimensional datasets with MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Diversified ranking on large graphs: an optimization viewpoint
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
GBASE: a scalable and general graph management system
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Spectral analysis for billion-scale graphs: discoveries and implementation
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Graph-based data warehousing using the core-facets model
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
Unifying guilt-by-association approaches: theorems and fast algorithms
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Making time-stepped applications tick in the cloud
Proceedings of the 2nd ACM Symposium on Cloud Computing
A distributed look-up architecture for text mining applications using MapReduce
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable manipulation of archival web graphs
Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
ParallelGDB: a parallel graph database based on cache specialization
Proceedings of the 15th Symposium on International Database Engineering & Applications
Kineograph: taking the pulse of a fast-changing and connected world
Proceedings of the 7th ACM european conference on Computer Systems
Matrix chain multiplication via multi-way join algorithms in MapReduce
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
RDFPath: path query processing on large RDF graphs with mapreduce
ESWC'11 Proceedings of the 8th international conference on The Semantic Web
iMapReduce: A Distributed Computing Framework for Iterative Computation
Journal of Grid Computing
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Managing large dynamic graphs efficiently
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Towards effective partition management for large graphs
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Managing and mining large graphs: patterns and algorithms
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Managing and mining large graphs: systems and implementations
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
OPAvion: mining and visualization in large graphs
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Flexible and efficient distributed resolution of large entities
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
MapReduce in MPI for Large-scale graph algorithms
Parallel Computing
Accelerate large-scale iterative computation through asynchronous accumulative updates
Proceedings of the 3rd workshop on Scientific Cloud Computing Date
Highly scalable graph search for the Graph500 benchmark
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Distributed approximate spectral clustering for large-scale datasets
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Personalized news recommendation: a review and an experimental investigation
Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
BC-PDM: data mining, social network analysis and text mining system based on cloud computing
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce for parallel reinforcement learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Spinning fast iterative data flows
Proceedings of the VLDB Endowment
Delta-SimRank computing on MapReduce
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
A parallel graph partitioning algorithm to speed up the large-scale distributed graph mining
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Efficient graph management based on bitmap indices
Proceedings of the 16th International Database Engineering & Applications Sysmposium
On computing the diameter of real-world directed (weighted) graphs
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
gbase: an efficient analysis platform for large graphs
The VLDB Journal — The International Journal on Very Large Data Bases
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
GraphChi: large-scale graph computation on just a PC
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Multimedia Applications and Security in MapReduce: Opportunities and Challenges
Concurrency and Computation: Practice & Experience
Improving large graph processing on partitioned graphs in the cloud
Proceedings of the Third ACM Symposium on Cloud Computing
CC-MR --- finding connected components in huge graphs with mapreduce
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Expanders, tropical semi-rings, and nuclear norms: oh my!
XRDS: Crossroads, The ACM Magazine for Students - Scientific Computing
Exploiting and Evaluating MapReduce for Large-Scale Graph Mining
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Cumulon: optimizing statistical data analysis in the cloud
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Mizan: a system for dynamic load balancing in large-scale graph processing
Proceedings of the 8th ACM European Conference on Computer Systems
Trinity: a distributed graph engine on a memory cloud
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Big graph mining: algorithms and discoveries
ACM SIGKDD Explorations Newsletter
Issues in big data testing and benchmarking
Proceedings of the Sixth International Workshop on Testing Database Systems
GraphBuilder: scalable graph ETL framework
First International Workshop on Graph Data Management Experiences and Systems
GPS: a graph processing system
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast anomaly detection despite the duplicates
Proceedings of the 22nd international conference on World Wide Web companion
A first view of exedra: a domain-specific language for large graph analytics workflows
Proceedings of the 22nd international conference on World Wide Web companion
Distributed community detection in web-scale networks
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
An efficient MapReduce algorithm for counting triangles in a very large graph
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
"All roads lead to Rome": optimistic recovery for distributed iterative data processing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Combination of in-memory graph computation with mapreduce: a subgraph-centric method of pagerank
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Database research challenges and opportunities of big graph data
BNCOD'13 Proceedings of the 29th British National conference on Big Data
BDMPI: conquering BigData with small clusters using MPI
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads
Proceedings of the VLDB Endowment
FENNEL: streaming graph partitioning for massive scale graphs
Proceedings of the 7th ACM international conference on Web search and data mining
PREDIcT: towards predicting the runtime of large scale iterative analytics
Proceedings of the VLDB Endowment
Fast iterative graph computation with block updates
Proceedings of the VLDB Endowment
Parallel processing of large graphs
Future Generation Computer Systems
Exploiting inter-operation parallelism for matrix chain multiplication using MapReduce
The Journal of Supercomputing
WOOster: a map-reduce based platform for graph mining
Proceedings of the 17th International Conference on Management of Data
Hi-index | 0.00 |
In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components etc.) are essentially a repeated matrix-vector multiplication. In this paper we describe a very important primitive for PEGASUS, called GIM-V (Generalized Iterated Matrix-Vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web Graphs, thanks to Yahoo!, with 6,7 billion edges.