ParaTrac: a fine-grained profiler for data-intensive workflows

  • Authors:
  • Nan Dun;Kenjiro Taura;Akinori Yonezawa

  • Affiliations:
  • The University of Tokyo, Bunkyo-Ku, Tokyo, Japan;The University of Tokyo, Bunkyo-Ku, Tokyo, Japan;The University of Tokyo, Bunkyo-Ku, Tokyo, Japan

  • Venue:
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The realistic characteristics of data-intensive workflows are critical to optimal workflow orchestration and profiling is an effective approach to investigate the behaviors of such complex applications. ParaTrac is a fine-grained profiler for data-intensive workflows by using user-level file system and process tracing techniques. First, ParaTrac enables users to quickly understand the I/O characteristics of from entire application to specific processes or files by examining low-level I/O profiles. Second, ParaTrac automatically exploits fine-grained data-processes interactions in workflow to help users intuitively and quantitatively investigate realistic execution of data-intensive workflows. Experiments on thoroughly profiling Montage workflow demonstrate both the scalability and effectiveness of ParaTrac. The overhead of tracing thousands of processes is around 16%. We use low-level I/O profiles and informative workflow DAGs to illustrate the vantage of fine-grained profiling by helping users comprehensively understand the application behaviors and refine the scheduling for complex workflows. Our study also suggests that current workflow management systems may use fine-grained profiles to provide more flexible control for optimal workflow execution.