Improving large graph processing on partitioned graphs in the cloud

Authors:
Rishan Chen;Mao Yang;Xuetian Weng;Byron Choi;Bingsheng He;Xiaoming Li
Affiliations:
Shenzhen Graduate School of Peking University, China;Microsoft Research Asia;Shenzhen Graduate School of Peking University, China;Hong Kong Baptist University;Nanyang Technological University;Shenzhen Graduate School of Peking University, China
Venue:
Proceedings of the Third ACM Symposium on Cloud Computing
Year:
2012

Citing 29
Cited 3

A parallel algorithm for multilevel graph partitioning and sparse matrix ordering

Journal of Parallel and Distributed Computing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Parallel multilevel k-way partitioning scheme for irregular graphs

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Performance analysis of MPI collective operations

Cluster Computing
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Parallel multilevel algorithms for hypergraph partitioning

Journal of Parallel and Distributed Computing
Dcell: a scalable and fault-tolerant network structure for data centers

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
DISG: A DIStributed Graph Repository for Web Infrastructure (Invited Paper)

ISUC '08 Proceedings of the 2008 Second International Symposium on Universal Communication
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The nature of data center traffic: measurements & analysis

Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Comet: batched stream processing for data intensive distributed computing

Proceedings of the 1st ACM symposium on Cloud computing
G-Store: a scalable data store for transactional multi key access in the cloud

Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Indexing multi-dimensional data in a cloud system

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Large graph processing in the cloud

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The impact of virtualization on network performance of amazon EC2 data center

INFOCOM'10 Proceedings of the 29th conference on Information communications
The little engine(s) that could: scaling online social networks

Proceedings of the ACM SIGCOMM 2010 conference
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Distributed systems meet economics: pricing in the cloud

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Network traffic characteristics of data centers in the wild

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Fast distributed graph partition and application

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Piccolo: building fast, distributed programs with partitioned tables

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment

Mizan: a system for dynamic load balancing in large-scale graph processing

Proceedings of the 8th ACM European Conference on Computer Systems
WTF: the who to follow service at Twitter

Proceedings of the 22nd international conference on World Wide Web
Giraphx: parallel yet serializable large-scale graph processing

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the study of large graphs over hundreds of gigabytes becomes increasingly popular for various data-intensive applications in cloud computing, developing large graph processing systems has become a hot and fruitful research area. Many of those existing systems support a vertex-oriented execution model and allow users to develop custom logics on vertices. However, the inherently random access pattern on the vertex-oriented computation generates a significant amount of network traffic. While graph partitioning is known to be effective to reduce network traffic in graph processing, there is little attention given to how graph partitioning can be effectively integrated into large graph processing in the cloud environment. In this paper, we develop a novel graph partitioning framework to improve the network performance of graph partitioning itself, partitioned graph storage and vertex-oriented graph processing. All optimizations are specifically designed for the cloud network environment. In experiments, we develop a system prototype following Pregel (the latest vertex-oriented graph engine by Google), and extend it with our graph partitioning framework. We conduct the experiments with a real-world social network and synthetic graphs over 100GB each in a local cluster and on Amazon EC2. Our experimental results demonstrate the efficiency of our graph partitioning framework, and the effectiveness of network performance aware optimizations on the large graph processing engine.