Manimal: relational optimization for data-intensive programs

  • Authors:
  • Michael J. Cafarella;Christopher Ré

  • Affiliations:
  • University of Michigan, Ann Arbor, MI;University of Wisconsin, Madison, WI

  • Venue:
  • Procceedings of the 13th International Workshop on the Web and Databases
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The MapReduce distributed programming framework is very popular, but currently lacks the optimization techniques that have been standard with relational database systems for many years. This paper proposes Manimal, which uses static code analysis to detect MapReduce program semantics and thereby enable wholly-automatic optimization of MapReduce programs. For example, a programmer's map function that emits data only when an if... statement holds true is essentially encoding a selection condition; code analysis can detect and characterize these conditions. If Manimal has an appropriate index available, it can then alter MapReduce execution to use it. Manimal can address many different optimization opportunities, including projections, structure-aware data compression, and others. However, this paper illustrates the system by focusing on one: efficient selection. We give a static analysis algorithm that can detect selections in user programs, and cover how Manimal can employ a B+Tree to execute these selections efficiently at runtime. Testing Manimal on several standard MapReduce programs, we show that selection alone can automatically reduce a standard program's runtime to 63% of conventional MapReduce execution time on identical hardware. We also give an in-depth discussion of other optimization targets and detection techniques.