Operating system issues for petascale systems
ACM SIGOPS Operating Systems Review
Astronomical real-time streaming signal processing on a Blue Gene/L supercomputer
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Designing a highly-scalable operating system: the Blue Gene/L story
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Latency Hiding File I/O for Blue Gene Systems
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Multiple-Level MPI File Write-Back and Prefetching for Blue Gene Systems
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
The LOFAR correlator: implementation and performance analysis
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A Scalable Message Passing Interface Implementation of an Ad-Hoc Parallel I/o system
International Journal of High Performance Computing Applications
Remote Process Execution and Remote File I/O for Heterogeneous Processors in Cluster Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Providing a cloud network infrastructure on a supercomputer
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Accelerating I/O Forwarding in IBM Blue Gene/P Systems
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Performance and Scalability Evaluation of 'Big Memory' on Blue Gene Linux
International Journal of High Performance Computing Applications
Extending and benchmarking the "Big Memory" implementation on Blue Gene/P Linux
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Just in time: adding value to the IO pipelines of high performance applications with JITStaging
Proceedings of the 20th international symposium on High performance distributed computing
AME: an anyscale many-task computing engine
Proceedings of the 6th workshop on Workflows in support of large-scale science
Towards scalable I/O architecture for exascale systems
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Bridging HPC and grid file i/o with IOFSL
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
ExaScale high performance computing in the square kilometer array
Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date
Enabling event tracing at leadership-class scale through I/O forwarding middleware
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
McrEngine: a scalable checkpointing system using data-aware aggregation and compression
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
PRACE DECI (distributed european computing initiative) minisymposium
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
A 1 PB/s file system to checkpoint three million MPI tasks
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Memory-conscious collective I/O for extreme scale HPC systems
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Transactions on Edutainment IX
McrEngine: A scalable checkpointing system using data-aware aggregation and compression
Scientific Programming - Selected Papers from Super Computing 2012
Optimizing I/O forwarding techniques for extreme-scale event tracing
Cluster Computing
Hi-index | 0.00 |
The ZeptoOS project is developing an open-source alternative to the proprietary software stacks available on contemporary massively parallel architectures. The aim is to enable computer science research on these architectures, enhance community collaboration, and foster innovation. In this paper, we introduce a component of ZeptoOS called ZOID---an I/O-forwarding infrastructure for architectures such as IBM Blue Gene that decouple file and socket I/O from the compute nodes, shipping those functions to dedicated I/O nodes. Through the use of optimized network protocols and data paths, as well as a multithreaded daemon running on I/O nodes, ZOID provides greater performance than does the stock infrastructure. We present a set of benchmark results that highlight the improvements. Crucially, the flexibility of our infrastructure is a vast improvement over the stock infrastructure, allowing users to forward data using custom-designed application interfaces, through an easy-to-use plug-in mechanism. This capability is used for real-time telescope data transfers, extensively discussed in the paper. Plug-in--specific threads implement prefetching of data obtained over sockets from an input cluster and merge results from individual compute nodes before sending them out, significantly reducing required network bandwidth. This approach allows a ZOID version of the application to handle a larger number of subbands per I/O node, or even to bypass the input cluster altogether, plugging the input from remote receiver stations directly into the I/O nodes. Using the resources more efficiently can result in considerable savings.