Measurements of a distributed file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A static analysis of I/O characteristics of scientific applications in a production workload
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Long term distributed file reference tracing: implementation and experience
Software—Practice & Experience
File-Access Characteristics of Parallel Scientific Workloads
IEEE Transactions on Parallel and Distributed Systems
GXP: An Interactive Shell for the Grid Environment
IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
Tracefs: A File System to Trace Them All
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
A comparison of file system workloads
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
FIFS: a framework for implementing user-mode file systems in windows NT
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Fine-Grained Workflow in Heterogeneous Environments
PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Data Management Challenges of Data-Intensive Scientific Workflows
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Evaluating I/O characteristics and methods for storing structured scientific data
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Workflow overhead analysis and optimizations
Proceedings of the 6th workshop on Workflows in support of large-scale science
Dynamic cost verification for cloud applications
Proceedings of the 2012 Workshop on Dynamic Analysis
MTCProv: a practical provenance query framework for many-task scientific computing
Distributed and Parallel Databases
Characterizing and profiling scientific workflows
Future Generation Computer Systems
Hi-index | 0.00 |
The realistic characteristics of data-intensive workflows are critical to optimal workflow orchestration and profiling is an effective approach to investigate the behaviors of such complex applications. ParaTrac is a fine-grained profiler for data-intensive workflows by using user-level file system and process tracing techniques. First, ParaTrac enables users to quickly understand the I/O characteristics of from entire application to specific processes or files by examining low-level I/O profiles. Second, ParaTrac automatically exploits fine-grained data-processes interactions in workflow to help users intuitively and quantitatively investigate realistic execution of data-intensive workflows. Experiments on thoroughly profiling Montage workflow demonstrate both the scalability and effectiveness of ParaTrac. The overhead of tracing thousands of processes is around 16%. We use low-level I/O profiles and informative workflow DAGs to illustrate the vantage of fine-grained profiling by helping users comprehensively understand the application behaviors and refine the scheduling for complex workflows. Our study also suggests that current workflow management systems may use fine-grained profiles to provide more flexible control for optimal workflow execution.