GBASE: a scalable and general graph management system

Authors:
U. Kang;Hanghang Tong;Jimeng Sun;Ching-Yung Lin;Christos Faloutsos
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;IBM T.J. Watson, Hawthorne, NY, USA;IBM T.J. Watson, Hawthorne, NY, USA;IBM T.J. Watson, Hawthorne, NY, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 34
Cited 8

Multilevel k-way hypergraph partitioning

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs

SIAM Review
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Information-theoretic tools for mining database structure from large data sets

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Neighborhood Formation and Anomaly Detection in Bipartite Graphs

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Fast Random Walk with Restart and Its Applications

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Fast and practical indexing and querying of very large graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Data mining using high performance data clouds: experimental studies using sector and sphere

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
SmallBlue: Social Network Analysis for Expertise Search and Collective Intelligence

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
On compressing social networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
BBM: bayesian browsing model from petabyte-scale data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
An architecture for recycling intermediates in a column-store

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Distributed data-parallel computing using a high-level programming language

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Column-oriented database systems

Proceedings of the VLDB Endowment
GConnect: a connectivity index for massive disk-resident graphs

Proceedings of the VLDB Endowment
Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph

Discrete Applied Mathematics
Approximating betweenness centrality

WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Positional update handling in column stores

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast nearest-neighbor search in disk-resident graphs

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Neighbor query friendly compression of social networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
OddBall: spotting anomalies in weighted graphs

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II

Managing and mining large graphs: patterns and algorithms

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
gbase: an efficient analysis platform for large graphs

The VLDB Journal — The International Journal on Very Large Data Bases
Partial view selection for evolving social graphs

First International Workshop on Graph Data Management Experiences and Systems
TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A first view of exedra: a domain-specific language for large graph analytics workflows

Proceedings of the 22nd international conference on World Wide Web companion
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Horton+: a distributed system for processing declarative reachability queries over partitioned graphs

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs appear in numerous applications including cyber-security, the Internet, social networks, protein networks, recommendation systems, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose GBASE, a scalable and general graph management and mining system. The key novelties lie in 1) our storage and compression scheme for a parallel setting and 2) the carefully chosen graph operations and their efficient implementation. We designed and implemented an instance of GBASE using MapReduce/Hadoop. GBASE provides a parallel indexing mechanism for graph mining operations that both saves storage space, as well as accelerates queries. We ran numerous experiments on real graphs, spanning billions of nodes and edges, and we show that our proposed GBASE is indeed fast, scalable and nimble, with significant savings in space and time.