A generate-test-aggregate parallel programming library: systematic parallel programming for MapReduce

Authors:
Yu Liu;Kento Emoto;Zhenjiang Hu
Affiliations:
The Graduate University for Advanced Studies, Tokyo, Japan;University of Tokyo, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan
Venue:
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Year:
2013

Citing 19
Cited 0

Formal derivation of efficient parallel programs by construction of list homomorphisms

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallelization in calculational forms

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Make it practical: a generic linear-time algorithm for solving maximum-weightsum problems

ICFP '00 Proceedings of the fifth ACM SIGPLAN international conference on Functional programming
Introduction to Functional Programming

Introduction to Functional Programming
Systematic Extraction and Implementation of Divide-and-Conquer Parallelism

PLILP '96 Proceedings of the 8th International Symposium on Programming Languages: Implementations, Logics, and Programs
Formal Derivation of Parallel Program for 2-Dimensional Maximum Segment Sum Problem

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Systematic Efficient Parallelization of Scan and Other List Homomorphisms

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Deriving Parallel Codes via Invariants

SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Semiring parsing

Computational Linguistics
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Automatic inversion generates divide-and-conquer parallel programs

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths

PEPM '08 Proceedings of the 2008 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Building a high-level dataflow system on top of Map-Reduce: the Pig experience

Proceedings of the VLDB Endowment
Automatic parallelization via matrix multiplication

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Towards systematic parallel programming over mapreduce

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Generate, test, and aggregate: a calculation-based framework for systematic parallel programming with mapreduce

ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Filter-embedding semiring fusion for programming with MapReduce

Formal Aspects of Computing - Celebrating the 60th Birthday of Carroll Morgan

Quantified Score

Hi-index	0.00

Visualization

Abstract

Generate-Test-Aggregate (GTA for short) is a novel programming model for MapReduce, dramatically simplifying the development of efficient parallel algorithms. Under the GTA model, a parallel computation is encoded into a simple pattern: generate all candidates, test them to filter out invalid ones, and aggregate valid ones to make the result. Once users specify their parallel computations in the GTA style, they get efficient MapReduce programs for free owing to an automatic optimization given by the GTA theory. In this paper, we report our implementation of a GTA library to support programming in the GTA model. In this library, we provide a compact programming interface for hiding the complexity of GTA's internal transformation, so that many problems can be encoded in the GTA style easily and straightforwardly. The GTA transformation and optimization mechanism implemented inside is a black-box to the end users, while users can extend the library by modifying existing (or implementing new) generators, testers or aggregators through standard programming interfaces of the GTA library. This GTA programming library supports both sequential or parallel execution on single computer and on-cluster execution with MapReduce computing engines. We evaluate our library by giving the results of our experiments on large data to show the efficiency, scalability and usefulness of this GTA library.