Adaptive partitioning for large-scale dynamic graphs

  • Authors:
  • Luis Vaquero;Félix Cuadrado;Dionysios Logothetis;Claudio Martella

  • Affiliations:
  • Queen Mary University of London;Queen Mary University of London;Telefónica I+D;VU University Amsterdam

  • Venue:
  • Proceedings of the 4th annual Symposium on Cloud Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining large-scale graphs is increasingly important, as it provides a powerful way of extracting useful information from real-world data. Efficient processing of that volume of information requires partitioning the graph across multiple nodes in a distributed system. However, traversing edges across distributed partitions results in significant performance penalty due to the additional cost of inter-partition communication. Minimising the number of cut edges between partitions improves communication cost between neighbouring vertices; balanced graph partitioning is required for load balancing [2]. These large graphs represent real-world information, which is inherently dynamic. Recent systems such as Kineograph [1] can process changing graphs, but they do not consider the impact of dynamism in graph partitioning. To illustrate this impact, we built a call graph from mobile Call Detail Records data, with a sliding window defining the creation and removal of nodes and edges. The graph was partitioned using three different techniques: modulo hash (HSH), the most popular partitioning technique because of its high scalability to produce balanced partitions, [2]; a state of art streaming partition technique (deterministic greedy, DTG) [3]; and our adaptive repartitioning heuristic, (ADP). Figure 1 shows the evolution of the partitioning (expressed as the ratio of edges that cut across different partitions). While a good partitioning strategy significantly improves the initial ratio of cuts, the quality of the partitioning degrades over time, resulting in higher communication penalty. In order to prevent this performance degradation, current approaches would require a full graph repartition, which can be extremely costly with large-scale graphs, and generate downtime gaps in the system. While this problem does not deeply affect batch processing systems, it can greatly impact throughput and latency of graph processing systems requiring faster response times. We propose an adaptive approach, where the graph is optimised with every change, over computation execution. We improve graph partitioning in a scalable manner by applying a local decision heuristic, based on decentralised, iterative vertex migration. The heuristic [4] migrates vertices between partitions trying to minimise the number of cut edges, while at the same time keeping partitions balanced upon structural changes at run time. We tested this approach in a system that processes dynamic graphs and adapts to graph changes by applying the iterative vertex migration algorithm. While continuous migrations bring added overhead to the computation, we observed in several experiments that the total execution time was reduced by over 50%. A more detailed analysis of the system and experiments is available at [4].