An introduction to the theory of lists
Proceedings of the NATO Advanced Study Institute on Logic of programming and calculi of discrete design
Parallelizing complex scans and reductions
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Formal derivation of efficient parallel programs by construction of list homomorphisms
ACM Transactions on Programming Languages and Systems (TOPLAS)
Systematic Extraction and Implementation of Divide-and-Conquer Parallelism
PLILP '96 Proceedings of the 8th International Symposium on Programming Languages: Implementations, Logics, and Programs
Formal Derivation of Parallel Program for 2-Dimensional Maximum Segment Sum Problem
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Systematic Efficient Parallelization of Scan and Other List Homomorphisms
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Deriving Parallel Codes via Invariants
SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Computational Linguistics
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Automatic inversion generates divide-and-conquer parallel programs
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
IEEE Transactions on Information Theory
Filter-embedding semiring fusion for programming with MapReduce
Formal Aspects of Computing - Celebrating the 60th Birthday of Carroll Morgan
Hi-index | 0.00 |
The Generate-Test-Aggregate (GTA for short) algorithm is modeled following a simple and straightforward programming pattern, for combinatorial problems. First, generate all candidates; second, test and filter out invalid ones; finally, aggregate valid ones to make the final result. These three processing steps can be specified by three building blocks namely, generator, tester, and aggregator. Despite the simplicity of algorithm design, implementing the GTA algorithm naively following the three processing steps, i.e., brute-force, will result in an exponential-cost computation, and thus it is impractical for processing large data. The theory of GTA illustrates that if the definitions of generator, tester, and aggregator satisfy certain conditions, an efficient (usually near-linear cost) MapReduce program can be automatically derived from the GTA algorithm. The principle of GTA is attractive but how to make it being practically useful, remains as an important and challenge problem due to the complexity of GTA program transformations. In this paper, we report on our studying and implementation of a practical GTA library (written in the functional language Scala) which provides a systematic parallel programming approach for big-data analysis with MapReduce. The library provides a simple functional style programming interface and hides all the internal transformations. With this library, users can write parallel programs in a sequential manner in terms of the GTA algorithm, and the efficiency of the generated MapReduce programs is guaranteed systematically. Therefore, parallel programming for many problems could become no more a tough job. We demonstrate the usefulness of our GTA library on some interesting problems involving large data and show that lots of applications can be easily and efficiently solved by using our library.