Automatic optimization of parallel dataflow programs

  • Authors:
  • Christopher Olston;Benjamin Reed;Adam Silberstein;Utkarsh Srivastava

  • Affiliations:
  • Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research

  • Venue:
  • ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale parallel dataflow systems, e.g., Dryad and Map-Reduce, have attracted significant attention recently. High-level dataflow languages such as Pig Latin and Sawzall are being layered on top of these systems, to enable faster program development and more maintainable code. These languages engender greater transparency in program structure, and open up opportunities for automatic optimization. This paper proposes a set of optimization strategies for this context, drawing on and extending techniques from the database community.