Efficient data partitioning model for heterogeneous graphs in the cloud

Authors:
Kisung Lee;Ling Liu
Affiliations:
Georgia Institute of Technology;Georgia Institute of Technology
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 15
Cited 0

A multilevel algorithm for partitioning graphs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Analysis of multilevel graph partitioning

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Parallel multilevel k-way partitioning scheme for irregular graphs

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Multilevel algorithms for multi-constraint graph partitioning

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Balanced graph partitioning

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
An integrated experimental environment for distributed systems and networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Querying graph patterns

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

Proceedings of the fourth international workshop on Data-intensive distributed computing
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing

IEEE Transactions on Knowledge and Data Engineering
Distributed Semantic Web Data Management in HBase and MySQL Cluster

CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment
GraphChi: large-scale graph computation on just a PC

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
TripleBit: a fast and compact system for large scale RDF data

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the size and variety of information networks continue to grow in many scientific and engineering domains, we witness a growing demand for efficient processing of large heterogeneous graphs using a cluster of compute nodes in the Cloud. One open issue is how to effectively partition a large graph to process complex graph operations efficiently. In this paper, we present VB-Partitioner -- a distributed data partitioning model and algorithms for efficient processing of graph operations over large-scale graphs in the Cloud. Our VB-Partitioner has three salient features. First, it introduces vertex blocks (VBs) and extended vertex blocks (EVBs) as the building blocks for semantic partitioning of large graphs. Second, VB-Partitioner utilizes vertex block grouping algorithms to place those vertex blocks that have high correlation in graph structure into the same partition. Third, VB-Partitioner employs a VB-partition guided query partitioning model to speed up the parallel processing of graph pattern queries by reducing the amount of inter-partition query processing. We conduct extensive experiments on several real-world graphs with millions of vertices and billions of edges. Our results show that VB-Partitioner significantly outperforms the popular random block-based data partitioner in terms of query latency and scalability over large-scale graphs.