Large graph processing in the cloud

Authors:
Rishan Chen;Xuetian Weng;Bingsheng He;Mao Yang
Affiliations:
Peking University and Microsoft Research Asia, Beijing, China;Peking University and Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 8
Cited 9

A parallel algorithm for multilevel graph partitioning and sparse matrix ordering

Journal of Parallel and Distributed Computing
Parallel multilevel k-way partitioning scheme for irregular graphs

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Wave computing in the cloud

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems

Making time-stepped applications tick in the cloud

Proceedings of the 2nd ACM Symposium on Cloud Computing
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment
GraphChi: large-scale graph computation on just a PC

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Improving large graph processing on partitioned graphs in the cloud

Proceedings of the Third ACM Symposium on Cloud Computing
GPS: a graph processing system

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Simplifying Scalable Graph Processing with a Domain-Specific Language

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Horton+: a distributed system for processing declarative reachability queries over partitioned graphs

Proceedings of the VLDB Endowment
Parallel processing of large graphs

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the study of graphs, such as web and social graphs, becomes increasingly popular, the requirements of efficiency and programming flexibility of large graph processing tasks challenge existing tools. We propose to demonstrate Surfer, a large graph processing engine designed to execute in the cloud. Surfer provides two basic primitives for programmers - MapReduce and propagation. MapReduce, originally developed by Google, processes different key-value pairs in parallel, and propagation is an iterative computational pattern that transfers information along the edges from a vertex to its neighbors in the graph. These two primitives are complementary in graph processing. MapReduce is suitable for processing flat data structures, such as vertex-oriented tasks, and propagation is optimized for edge-oriented tasks on partitioned graphs. To further improve the programmability of large graph processing, Surfer consists of a small set of high level building blocks that use these two primitives. Developers may also construct custom building blocks. Surfer further provides a GUI (Graphical User Interface) using which developers can visually create large graph processing tasks. Surfer transforms a task into an execution plan composed of MapReduce and propagation operations. It then automatically applies various optimizations to improve the efficiency of distributed execution. Surfer also provides a visualization tool to monitor the detailed execution dynamics of the execution plan to show the interesting tradeoffs between MapReduce and propagation. We demonstrate our system in two ways: first, we demo the ease-of-programming features of the system; second, we show the efficiency of the system with a series of applications on a social network. We find that Surfer is simple to use and is highly efficient for large graph-based tasks.