gbase: an efficient analysis platform for large graphs

Authors:
U. Kang;Hanghang Tong;Jimeng Sun;Ching-Yung Lin;Christos Faloutsos
Affiliations:
Carnegie Mellon University, Pittsburgh, USA;IBM T. J. Watson, Yorktown Heights, USA;IBM T. J. Watson, Yorktown Heights, USA;IBM T. J. Watson, Yorktown Heights, USA;Carnegie Mellon University, Pittsburgh, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2012

Citing 39
Cited 3

Multilevel k-way hypergraph partitioning

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs

SIAM Review
Application of NASA General-Purpose Solver to Large-Scale Computations in Aeroacoustics

Application of NASA General-Purpose Solver to Large-Scale Computations in Aeroacoustics
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Information-theoretic tools for mining database structure from large data sets

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fully automatic cross-associations

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Neighborhood Formation and Anomaly Detection in Bipartite Graphs

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Fast Random Walk with Restart and Its Applications

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Fast and practical indexing and querying of very large graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Data mining using high performance data clouds: experimental studies using sector and sphere

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
SmallBlue: Social Network Analysis for Expertise Search and Collective Intelligence

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
On compressing social networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
BBM: bayesian browsing model from petabyte-scale data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
An architecture for recycling intermediates in a column-store

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Distributed data-parallel computing using a high-level programming language

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A demonstration of SciDB: a science-oriented DBMS

Proceedings of the VLDB Endowment
Column-oriented database systems

Proceedings of the VLDB Endowment
GConnect: a connectivity index for massive disk-resident graphs

Proceedings of the VLDB Endowment
Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph

Discrete Applied Mathematics
Approximating betweenness centrality

WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Positional update handling in column stores

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast nearest-neighbor search in disk-resident graphs

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Neighbor query friendly compression of social networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
GBASE: a scalable and general graph management system

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining

ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
OddBall: spotting anomalies in weighted graphs

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II

Big graph mining: algorithms and discoveries

ACM SIGKDD Explorations Newsletter
TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Specialized storage for big numeric time series

HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.