A generate-test-aggregate parallel programming library: systematic parallel programming for MapReduce

  • Authors:
  • Yu Liu;Kento Emoto;Zhenjiang Hu

  • Affiliations:
  • The Graduate University for Advanced Studies, Tokyo, Japan;University of Tokyo, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan

  • Venue:
  • Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Generate-Test-Aggregate (GTA for short) is a novel programming model for MapReduce, dramatically simplifying the development of efficient parallel algorithms. Under the GTA model, a parallel computation is encoded into a simple pattern: generate all candidates, test them to filter out invalid ones, and aggregate valid ones to make the result. Once users specify their parallel computations in the GTA style, they get efficient MapReduce programs for free owing to an automatic optimization given by the GTA theory. In this paper, we report our implementation of a GTA library to support programming in the GTA model. In this library, we provide a compact programming interface for hiding the complexity of GTA's internal transformation, so that many problems can be encoded in the GTA style easily and straightforwardly. The GTA transformation and optimization mechanism implemented inside is a black-box to the end users, while users can extend the library by modifying existing (or implementing new) generators, testers or aggregators through standard programming interfaces of the GTA library. This GTA programming library supports both sequential or parallel execution on single computer and on-cluster execution with MapReduce computing engines. We evaluate our library by giving the results of our experiments on large data to show the efficiency, scalability and usefulness of this GTA library.