Towards effective partition management for large graphs

Authors:
Shengqi Yang;Xifeng Yan;Bo Zong;Arijit Khan
Affiliations:
University of California at Santa Barbara, Santa Barbara, CA, USA;University of California at Santa Barbara, Santa Barbara, CA, USA;University of California at Santa Barbara, Santa Barbara, CA, USA;University of California at Santa Barbara, Santa Barbara, CA, USA
Venue:
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Year:
2012

Citing 29
Cited 5

Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A multilevel algorithm for partitioning graphs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences
Parallel dynamic graph partitioning for adaptive unstructured meshes

Journal of Parallel and Distributed Computing - Special issue on dynamic load balancing
Geometric Mesh Partitioning: Implementation and Experiments

SIAM Journal on Scientific Computing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Graph partitioning models for parallel computing

Parallel Computing - Special issue on graph partioning and parallel computing
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Distributed query evaluation on semistructured data

ACM Transactions on Database Systems (TODS)
SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs

HPCN Europe 1996 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
Scaling personalized web search

WWW '03 Proceedings of the 12th international conference on World Wide Web
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
The Structure and Dynamics of Networks: (Princeton Studies in Complexity)

The Structure and Dynamics of Networks: (Princeton Studies in Complexity)
Using partial evaluation in distributed query evaluation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
The scalable hyperlink store

Proceedings of the 20th ACM conference on Hypertext and hypermedia
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Stateful bulk processing for incremental analytics

Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The little engine(s) that could: scaling online social networks

Proceedings of the ACM SIGCOMM 2010 conference
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks

ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Schism: a workload-driven approach to database replication and partitioning

Proceedings of the VLDB Endowment
Querying semantic web data with SPARQL

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Zephyr: live migration in shared nothing databases for elastic cloud platforms

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Linked Data

Linked Data

A parallel graph partitioning algorithm to speed up the large-scale distributed graph mining

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
GPS: a graph processing system

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
A distributed graph engine for web scale RDF data

Proceedings of the VLDB Endowment
Analysis of partitioning strategies for graph processing in bulk synchronous parallel models

Proceedings of the fifth international workshop on Cloud data management
Strong simulation: Capturing topology in graph pattern matching

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching and mining large graphs today is critical to a variety of application domains, ranging from community detection in social networks, to de novo genome sequence assembly. Scalable processing of large graphs requires careful partitioning and distribution of graphs across clusters. In this paper, we investigate the problem of managing large-scale graphs in clusters and study access characteristics of local graph queries such as breadth-first search, random walk, and SPARQL queries, which are popular in real applications. These queries exhibit strong access locality, and therefore require specific data partitioning strategies. In this work, we propose a Self Evolving Distributed Graph Management Environment (Sedge), to minimize inter-machine communication during graph query processing in multiple machines. In order to improve query response time and throughput, Sedge introduces a two-level partition management architecture with complimentary primary partitions and dynamic secondary partitions. These two kinds of partitions are able to adapt in real time to changes in query workload. (Sedge) also includes a set of workload analyzing algorithms whose time complexity is linear or sublinear to graph size. Empirical results show that it significantly improves distributed graph processing on today's commodity clusters.