GLADE: a scalable framework for efficient analytics

Authors:
Florin Rusu;Alin Dobra
Affiliations:
University of California, Merced, Merced, CA;University of Florida, Gainesville, FL
Venue:
ACM SIGOPS Operating Systems Review
Year:
2012

Citing 15
Cited 3

The POSTGRES Data Model

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Using SQL to Build New Aggregates and Extenders for Object- Relational Systems

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
User-defined aggregate functions: bridging theory and practice

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Distributed data-parallel computing using a high-level programming language

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Distributed aggregation for data-parallel computing: interfaces and implementations

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

Proceedings of the VLDB Endowment
MAD skills: new analysis practices for big data

Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
The DataPath system: a data-centric analytic processing engine for large data warehouses

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment

Predictive analytics with surveillance big data

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

Proceedings of the Second Workshop on Data Analytics in the Cloud
Sampling estimators for parallel online aggregation

BNCOD'13 Proceedings of the 29th British National conference on Big Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we introduce GLADE, a scalable distributed framework for large scale data analytics. GLADE consists of a simple user-interface to define Generalized Linear Aggregates (GLA), the fundamental abstraction at the core of GLADE, and a distributed runtime environment that executes GLAs by using parallelism extensively. GLAs are derived from User-Defined Aggregates (UDA), a relational database extension that allows the user to add specialized aggregates to be executed inside the query processor. GLAs extend the UDA interface with methods to Serialize/Deserialize the state of the aggregate required for distributed computation. As a significant departure from UDAs which can be invoked only through SQL, GLAs give the user direct access to the state of the aggregate, thus allowing for the computation of significantly more complex aggregate functions. GLADE runtime is an execution engine optimized for the GLA computation. The runtime takes the user-defined GLA code, compiles it inside the engine, and executes it right near the data by taking advantage of parallelism both inside a single machine as well as across a cluster of computers. This results in maximum possible execution time performance (all our experimental tasks are I/O-bound) and linear scaleup.