AME: an anyscale many-task computing engine

Authors:
Zhao Zhang;Daniel S. Katz;Matei Ripeanu;Michael Wilde;Ian T. Foster
Affiliations:
University of Chicago, Chicago, IL, USA;University of Chicago & Argonne National Laboratory, Chicago, IL, USA;University of British Columbia, Vancouver, BC, Canada;University of Chicago & Argonne National Laboratory, Chicago, IL, USA;University of Chicago & Argonne National Laboratory, Chicago, IL, USA
Venue:
Proceedings of the 6th workshop on Workflows in support of large-scale science
Year:
2011

Citing 13
Cited 3

Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Condor-G: A Computation Management Agent for Multi-Institutional Grids

Cluster Computing
Data management and transfer in high-performance computational grid environments

Parallel Computing - Parallel data-intensive algorithms and applications
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid

ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
ZOID: I/O-forwarding infrastructure for petascale architectures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Falkon: a Fast and Light-weight tasK executiON framework

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Parallel Scripting for Applications at the Petascale and Beyond

Computer
Case studies in storage access by loosely coupled petascale applications

Proceedings of the 4th Annual Workshop on Petascale Data Storage
The case for a versatile storage system

ACM SIGOPS Operating Systems Review
Swift: A language for distributed parallel scripting

Parallel Computing

A Workflow-Aware Storage System: An Opportunity Study

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Design and analysis of data management in scalable parallel scripting

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MTC envelope: defining the capability of large scale computers in the context of parallel scripting applications

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many-Task Computing (MTC) is a new application category that encompasses increasingly popular applications in biology, economics, and statistics. The high inter-task parallelism and data-intensive processing capabilities of these applications pose new challenges to existing supercomputer hardware-software stacks. These challenges include resource provisioning; task dispatching, dependency resolution, and load balancing; data management; and resilience. This paper examines the characteristics of MTC applications which create these challenges, and identifies related gaps in the middleware that supports these applications on extreme-scale systems. Based on this analysis, we propose AME, an Anyscale MTC Engine, which addresses the scalability aspects of these gaps. We describe the AME framework and present performance results for both synthetic benchmarks and real applications. Our results show that AME's dispatching performance linearly scales up to 14,120 tasks/second on 16,384 cores with high efficiency. The overhead of the intermediate data management scheme does not increase significantly up to 16,384 cores. AME eliminates 73% of the file transfer between compute nodes and the global filesystem for the Montage astronomy application running on 2,048 cores. Our results indicate that AME scales well on today's petascale machines, and is a strong candidate for exascale machines.