An integrated framework for performance-based optimization of scientific workflows

  • Authors:
  • Vijay S. Kumar;P. Sadayappan;Gaurang Mehta;Karan Vahi;Ewa Deelman;Varun Ratnakar;Jihie Kim;Yolanda Gil;Mary Hall;Tahsin Kurc;Joel Saltz

  • Affiliations:
  • Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;University of Southern California, Marina del Rey, CA, USA;University of Southern California, Marina del Rey, CA, USA;University of Southern California, Marina del Rey, CA, USA;University of Southern California, Marina del Rey, CA, USA;University of Southern California, Marina del Rey, CA, USA;University of Southern California, Marina del Rey, CA, USA;University of Utah, Salt Lake City, UT, USA;Emory University, Atlanta, GA, USA;Emory University, Atlanta, GA, USA

  • Venue:
  • Proceedings of the 18th ACM international symposium on High performance distributed computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework.