Large graph processing in the cloud

  • Authors:
  • Rishan Chen;Xuetian Weng;Bingsheng He;Mao Yang

  • Affiliations:
  • Peking University and Microsoft Research Asia, Beijing, China;Peking University and Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China

  • Venue:
  • Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the study of graphs, such as web and social graphs, becomes increasingly popular, the requirements of efficiency and programming flexibility of large graph processing tasks challenge existing tools. We propose to demonstrate Surfer, a large graph processing engine designed to execute in the cloud. Surfer provides two basic primitives for programmers - MapReduce and propagation. MapReduce, originally developed by Google, processes different key-value pairs in parallel, and propagation is an iterative computational pattern that transfers information along the edges from a vertex to its neighbors in the graph. These two primitives are complementary in graph processing. MapReduce is suitable for processing flat data structures, such as vertex-oriented tasks, and propagation is optimized for edge-oriented tasks on partitioned graphs. To further improve the programmability of large graph processing, Surfer consists of a small set of high level building blocks that use these two primitives. Developers may also construct custom building blocks. Surfer further provides a GUI (Graphical User Interface) using which developers can visually create large graph processing tasks. Surfer transforms a task into an execution plan composed of MapReduce and propagation operations. It then automatically applies various optimizations to improve the efficiency of distributed execution. Surfer also provides a visualization tool to monitor the detailed execution dynamics of the execution plan to show the interesting tradeoffs between MapReduce and propagation. We demonstrate our system in two ways: first, we demo the ease-of-programming features of the system; second, we show the efficiency of the system with a series of applications on a social network. We find that Surfer is simple to use and is highly efficient for large graph-based tasks.