Faster Collective Output through Active Buffering

Authors:
Xiaosong Ma;Marianne Winslett;Jonghyun Lee;Shengke Yu
Affiliations:
-;-;-;-
Venue:
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Year:
2002

Citing 11
Cited 11

Design and Evaluation of primitives for Parallel I/O

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Server-directed collective I/O in Panda

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Tuning the performance of I/O-intensive parallel applications

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
ENWRICH: a compute-processor write caching scheme for parallel file systems

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
An interprocedural framework for placement of asynchronous I/O operations

ICS '96 Proceedings of the 10th international conference on Supercomputing
Enhancing disk-directed I/O for fine-grained redistribution of file data

Parallel Computing - Special double issue: parallel I/O
Automatic parallel I/O performance optimization in Panda

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A comparison of three programming models for adaptive applications on the Origin2000

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Improving Collective I/O Performance Using Threads

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Design and Implementation of a Parallel I/O Runtime System for Irregular Applications

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Disk-directed I/O for MIMD multiprocessors

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

Active buffering plus compressed migration: an integrated solution to parallel simulations' data transport needs

ICS '02 Proceedings of the 16th international conference on Supercomputing
High-Level Buffering for Hiding Periodic Output Cost in Scientific Simulations

IEEE Transactions on Parallel and Distributed Systems
Modeling and improving security of a local disk system for write-intensive workloads

ACM Transactions on Storage (TOS)
Design and analysis of a load balancing strategy in data grids

Future Generation Computer Systems - Special section: Data mining in grid computing environments
Performance comparisons of load balancing algorithms for I/O-intensive workloads on clusters

Journal of Network and Computer Applications
Hiding I/O latency with pre-execution prefetching for parallel applications

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Dynamic load balancing for I/O-intensive applications on clusters

ACM Transactions on Storage (TOS)
Conserving energy in real-time storage systems with I/O burstiness

ACM Transactions on Embedded Computing Systems (TECS)
A cost-intelligent application-specific data layout scheme for parallel file systems

Proceedings of the 20th international symposium on High performance distributed computing
Task partitioning, scheduling and load balancing strategy for mixed nature of tasks

The Journal of Supercomputing
Cost-intelligent application-specific data layout optimization for parallel file systems

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific applications often need to write out large arrays and associated metadata periodically for visualization or restart purposes. In this paper, we propose active buffering for collective I/O, in which processors actively organize their idle memory into a hierarchy of buffers for periodic output data. Active buffering exploits one-sided communication for I/O processors to fetch data from compute processors' buffers and performs actual writing in the background while compute processors are computing. It gracefully adapts as buffers at different level of the hierarchy fill and empty, and as new collective I/O requests arrive. Experimental results with synthetic benchmarks and a real rocket simulation code on the SGI Origin 2000 and IBM SP show that active buffering improves the apparent collective write throughput so that it approaches the local memory bandwidth or the MPI bandwidth under appropriate conditions. These speedups are due entirely to increased parallelism during I/O, and are in addition to any performance improvements that may come from buffering small requests.