The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
On implementing MPI-IO portably and with high performance
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Data Sieving and Collective I/O in ROMIO
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid
ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Productivity and performance using partitioned global address space languages
Proceedings of the 2007 international workshop on Parallel symbolic computation
Workflow task clustering for best effort systems with Pegasus
Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Accelerating large-scale data exploration through data diffusion
DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Massively parallel genomic sequence search on the Blue Gene/P architecture
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
International Journal of Computational Science and Engineering
Transforming MPI source code based on communication patterns
Future Generation Computer Systems
Case studies in storage access by loosely coupled petascale applications
Proceedings of the 4th Annual Workshop on Petascale Data Storage
AME: an anyscale many-task computing engine
Proceedings of the 6th workshop on Workflows in support of large-scale science
Integration of scheduling and replication in data grids
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Swift: A language for distributed parallel scripting
Parallel Computing
A Workflow-Aware Storage System: An Opportunity Study
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Parallelizing the execution of sequential scripts
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Understanding workflows for distributed computing: nitty-gritty details
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
SDAFT: a novel scalable data access framework for parallel BLAST
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Predicting intermediate storage performance for workflow applications
PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
Hi-index | 0.00 |
We seek to enable efficient large-scale parallel execution of applications in which a shared filesystem abstraction is used to couple many tasks. Such parallel scripting (many-task computing, MTC) applications suffer poor performance and utilization on large parallel computers because of the volume of filesystem I/O and a lack of appropriate optimizations in the shared filesystem. Thus, we design and implement a scalable MTC data management system that uses aggregated compute node local storage for more efficient data movement strategies. We co-design the data management system with the data-aware scheduler to enable dataflow pattern identification and automatic optimization. The framework reduces the time to solution of parallel stages of an astronomy data analysis application, Montage, by 83.2% on 512 cores; decreases the time to solution of a seismology application, CyberShake, by 7.9% on 2,048 cores; and delivers BLAST performance better than mpiBLAST at various scales up to 32,768 cores, while preserving the flexibility of the original BLAST application.