Formal derivation of efficient parallel programs by construction of list homomorphisms
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallelization in calculational forms
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Make it practical: a generic linear-time algorithm for solving maximum-weightsum problems
ICFP '00 Proceedings of the fifth ACM SIGPLAN international conference on Functional programming
Introduction to Functional Programming
Introduction to Functional Programming
Systematic Extraction and Implementation of Divide-and-Conquer Parallelism
PLILP '96 Proceedings of the 8th International Symposium on Programming Languages: Implementations, Logics, and Programs
Formal Derivation of Parallel Program for 2-Dimensional Maximum Segment Sum Problem
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Systematic Efficient Parallelization of Scan and Other List Homomorphisms
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Deriving Parallel Codes via Invariants
SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Computational Linguistics
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Automatic inversion generates divide-and-conquer parallel programs
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths
PEPM '08 Proceedings of the 2008 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Automatic parallelization via matrix multiplication
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Towards systematic parallel programming over mapreduce
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Filter-embedding semiring fusion for programming with MapReduce
Formal Aspects of Computing - Celebrating the 60th Birthday of Carroll Morgan
Hi-index | 0.00 |
Generate-Test-Aggregate (GTA for short) is a novel programming model for MapReduce, dramatically simplifying the development of efficient parallel algorithms. Under the GTA model, a parallel computation is encoded into a simple pattern: generate all candidates, test them to filter out invalid ones, and aggregate valid ones to make the result. Once users specify their parallel computations in the GTA style, they get efficient MapReduce programs for free owing to an automatic optimization given by the GTA theory. In this paper, we report our implementation of a GTA library to support programming in the GTA model. In this library, we provide a compact programming interface for hiding the complexity of GTA's internal transformation, so that many problems can be encoded in the GTA style easily and straightforwardly. The GTA transformation and optimization mechanism implemented inside is a black-box to the end users, while users can extend the library by modifying existing (or implementing new) generators, testers or aggregators through standard programming interfaces of the GTA library. This GTA programming library supports both sequential or parallel execution on single computer and on-cluster execution with MapReduce computing engines. We evaluate our library by giving the results of our experiments on large data to show the efficiency, scalability and usefulness of this GTA library.