MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Distributed data-parallel computing using a high-level programming language
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
The DataPath system: a data-centric analytic processing engine for large data warehouses
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Parallel online aggregation in action
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Astronomical data processing in EXTASCID
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE
Proceedings of the Second Workshop on Data Analytics in the Cloud
Sampling estimators for parallel online aggregation
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
We present GLADE, a scalable distributed system for large scale data analytics. GLADE takes analytical functions expressed through the User-Defined Aggregate (UDA) interface and executes them efficiently on the input data. The entire computation is encapsulated in a single class which requires the definition of four methods. The runtime takes the user code and executes it right near the data by taking full advantage of the parallelism available inside a single machine as well as across a cluster of computing nodes. The demonstration has two goals. First, it presents the architecture of GLADE and how processing is done by using a series of analytical functions. Second, it compares GLADE with two different classes of systems for data analytics: a relational database (PostgreSQL) enhanced with UDAs and Map-Reduce (Hadoop). We show how the analytical functions are coded into each of these systems (for Map-Reduce, we use both Java code as well as Pig Latin) and compare their expressiveness, scalability, and running time efficiency.