Managing large dynamic graphs efficiently

Authors:
Jayanta Mondal;Amol Deshpande
Affiliations:
University of Maryland, College Park, MD, USA;University of Maryland, College Park, MD, USA
Venue:
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Year:
2012

Citing 25
Cited 4

The multicast policy and its relationship to replicated data placement

ACM Transactions on Database Systems (TODS)
Gram: a graph data model and query languages

ECHT '92 Proceedings of the ACM conference on Hypertext
An improved spectral graph partitioning algorithm for mapping parallel computations

SIAM Journal on Scientific Computing
An adaptive data replication algorithm

ACM Transactions on Database Systems (TODS)
A graph-oriented object database model

PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
GraphDB: Modeling and Querying Graphs in Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A graph query language and its query processing

A graph query language and its query processing
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Structure and evolution of online social networks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical properties of community structure in large social and information networks

Proceedings of the 17th international conference on World Wide Web
Graphs-at-a-time: query language and access methods for graph databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficiently answering reachability queries on very large directed graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Keyword proximity search in complex data graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
On Finding Dense Subgraphs

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Pregel: a system for large-scale graph processing - "ABSTRACT"

Proceedings of the 28th ACM symposium on Principles of distributed computing
Characterizing user behavior in online social networks

Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Feeding frenzy: selectively materializing users' event feeds

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The little engine(s) that could: scaling online social networks

Proceedings of the ACM SIGCOMM 2010 conference
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks

ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Graph pattern matching: from intractable to polynomial time

Proceedings of the VLDB Endowment
GRAIL: scalable reachability index for large graphs

Proceedings of the VLDB Endowment

Towards big linked data: a large-scale, distributed semantic data storage

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
WTF: the who to follow service at Twitter

Proceedings of the 22nd international conference on World Wide Web
Analysis of partitioning strategies for graph processing in bulk synchronous parallel models

Proceedings of the fifth international workshop on Cloud data management
Database research challenges and opportunities of big graph data

BNCOD'13 Proceedings of the 29th British National conference on Big Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is an increasing need to ingest, manage, and query large volumes of graph-structured data arising in applications like social networks, communication networks, biological networks, and so on. Graph databases that can explicitly reason about the graphical nature of the data, that can support flexible schemas and node-centric or edge-centric analysis and querying, are ideal for storing such data. However, although there is much work on single-site graph databases and on efficiently executing different types of queries over large graphs, to date there is little work on understanding the challenges in distributed graph databases, needed to handle the large scale of such data. In this paper, we propose the design of an in-memory, distributed graph data management system aimed at managing a large-scale dynamically changing graph, and supporting low-latency query processing over it. The key challenge in a distributed graph database is that, partitioning a graph across a set of machines inherently results in a large number of distributed traversals across partitions to answer even simple queries. We propose aggressive replication of the nodes in the graph for supporting low-latency querying, and investigate three novel techniques to minimize the communication bandwidth and the storage requirements. First, we develop a hybrid replication policy that monitors node read-write frequencies to dynamically decide what data to replicate, and whether to do eager or lazy replication. Second, we propose a clustering-based approach to amortize the costs of making these replication decisions. Finally, we propose using a fairness criterion to dictate how replication decisions should be made. We provide both theoretical analysis and efficient algorithms for the optimization problems that arise. We have implemented our framework as a middleware on top of the open-source CouchDB key-value store. We evaluate our system on a social graph, and show that our system is able to handle very large graphs efficiently, and that it reduces the network bandwidth consumption significantly.