Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
A view of the EM algorithm that justifies incremental, sparse, and other variants
Proceedings of the NATO Advanced Study Institute on Learning in graphical models
A first order approximation to the optimum checkpoint interval
Communications of the ACM
Detecting termination of distributed computations using markers
PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Criticality and parallelism in combinatorial optimization
Criticality and parallelism in combinatorial optimization
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Survey of graph database models
ACM Computing Surveys (CSUR)
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Large-Scale Parallel Collaborative Filtering for the Netflix Prize
AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Distributed parallel inference on large factor graphs
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Large graph processing in the cloud
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A common substrate for cluster computing
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
An architecture for parallel topic models
Proceedings of the VLDB Endowment
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Counting triangles and the curse of the last reducer
Proceedings of the 20th international conference on World wide web
Filtering: a method for solving graph problems in MapReduce
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Transparent user models for personalization
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
The seven deadly sins of cloud computing research
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Scalable similarity-based neighborhood methods with MapReduce
Proceedings of the sixth ACM conference on Recommender systems
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
GraphChi: large-scale graph computation on just a PC
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Facilitating real-time graph mining
Proceedings of the fourth international workshop on Cloud data management
Coflow: a networking abstraction for cluster applications
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
Improving large graph processing on partitioned graphs in the cloud
Proceedings of the Third ACM Symposium on Cloud Computing
Ligra: a lightweight graph processing framework for shared memory
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Cloud driven design of a distributed genetic programming platform
EvoApplications'13 Proceedings of the 16th European conference on Applications of Evolutionary Computation
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Mizan: a system for dynamic load balancing in large-scale graph processing
Proceedings of the 8th ACM European Conference on Computer Systems
Big graph mining: algorithms and discoveries
ACM SIGKDD Explorations Newsletter
Issues in big data testing and benchmarking
Proceedings of the Sixth International Workshop on Testing Database Systems
GraphBuilder: scalable graph ETL framework
First International Workshop on Graph Data Management Experiences and Systems
Early experiences in using a domain-specific language for large-scale graph analysis
First International Workshop on Graph Data Management Experiences and Systems
TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond myopic inference in big data pipelines
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
G-path: flexible path pattern query on large graphs
Proceedings of the 22nd international conference on World Wide Web companion
A first view of exedra: a domain-specific language for large graph analytics workflows
Proceedings of the 22nd international conference on World Wide Web companion
WTF: the who to follow service at Twitter
Proceedings of the 22nd international conference on World Wide Web
Large-scale computation not at the cost of expressiveness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Solving the straggler problem with bounded staleness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Mammoth: autonomic data processing framework for scientific state-transition applications
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Supporting feature location and mining of software repositories on the Amazon EC2
Proceedings of the 51st ACM Southeast Conference
i2MapReduce: incremental iterative MapReduce
Proceedings of the 2nd International Workshop on Cloud Intelligence
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Efficient data partitioning model for heterogeneous graphs in the cloud
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Entity disambiguation in anonymized graphs using graph kernels
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
PAGE: a partition aware graph computation engine
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
GAPfm: optimal top-n recommendations for graded relevance domains
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
"All roads lead to Rome": optimistic recovery for distributed iterative data processing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Distributed matrix factorization with mapreduce using a series of broadcast-joins
Proceedings of the 7th ACM conference on Recommender systems
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Pregelix: dataflow-based big graph analytics
Proceedings of the 4th annual Symposium on Cloud Computing
Giraphx: parallel yet serializable large-scale graph processing
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Realtime analysis of information diffusion in social media
Proceedings of the VLDB Endowment
A distributed algorithm for large-scale generalized matching
Proceedings of the VLDB Endowment
Simplifying Scalable Graph Processing with a Domain-Specific Language
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
PREDIcT: towards predicting the runtime of large scale iterative analytics
Proceedings of the VLDB Endowment
Fast iterative graph computation with block updates
Proceedings of the VLDB Endowment
Maximal clique enumeration for large graphs on hadoop framework
Proceedings of the first workshop on Parallel programming for analytics applications
Benchmarking graph-processing platforms: a vision
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Hi-index | 0.00 |
While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronous, dynamic, graph-parallel computation while ensuring data consistency and achieving a high degree of parallel performance in the shared-memory setting. In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. We develop graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency. We also introduce fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can be easily implemented by exploiting the GraphLab abstraction itself. Finally, we evaluate our distributed implementation of the GraphLab abstraction on a large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains over Hadoop-based implementations.