Nonuniform traffic spots (NUTS) in multistage interconnection networks
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Server-directed collective I/O in Panda
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Resource Allocation in Cube Network Systems Based on the Covering Radius
IEEE Transactions on Parallel and Distributed Systems
Collective parallel I/O
Strategic directions in storage I/O issues in large-scale computing
ACM Computing Surveys (CSUR) - Special ACM 50th-anniversary issue: strategic directions in computing research
Heuristics for Scheduling I/O Operations
IEEE Transactions on Parallel and Distributed Systems
Noncontiguous Processor Allocation Algorithms for Mesh-Connected Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Exploiting local data in parallel array I/O on a practical network of workstations
Proceedings of the fifth workshop on I/O in parallel and distributed systems
Resource Placement in Torus-Based Networks
IEEE Transactions on Computers
The impact of I/O on program behavior and parallel scheduling
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Routing and scheduling I/O transfers on wormhole-routed mesh networks
Journal of Parallel and Distributed Computing
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
TFLOPS PFS: architecture and design of a highly efficient parallel file system
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Input/Output in Parallel and Distributed Computer Systems
Input/Output in Parallel and Distributed Computer Systems
NAS Parallel Benchmark Results
IEEE Parallel & Distributed Technology: Systems & Technology
Parallel I/O Subsystems in Massively Parallel Supercomputers
IEEE Parallel & Distributed Technology: Systems & Technology
Balancing Contention and Synchronization on the Intel Paragon
IEEE Parallel & Distributed Technology: Systems & Technology
A TeraFLOP Supercomputer in 1996: The ASCI TFLOP System
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
PARA '96 Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization
A Batch Scheduler for the Intel Paragon MPP System with a Non-contiguous Node Allocation Algorithm
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Implications of I/O for Gang Scheduled Workloads
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A Multipath Contention Model for Analyzing Job Interactions in 2-D Mesh Multicomputers
Proceedings of the 8th International Symposium on Parallel Processing
The Effects of Network Contention on Processor Allocation Strategies
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Job Scheduling that Minimizes Network Contention due to both Communication and I/O
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Parallel i/o- and communication-sensitive scheduling on high-performance parallel computers
Parallel i/o- and communication-sensitive scheduling on high-performance parallel computers
Optimizing fastquery performance on lustre file system
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
Network contention hotspots can limit network throughput for parallel disk I/O, even when the interconnection network appears to be sufficiently provisioned. We studied I/O hotspots in mesh networks as a function of the spatial layout of an application's compute nodes relative to the I/O nodes. Our analytical modeling and dynamic simulations show that when I/O nodes are configured on one side of a two-dimensional mesh, realizable I/O throughput is at best bounded by four times the network bandwidth per link. Maximal performance depends on the spatial layout of jobs, and cannot be further improved by adding I/O nodes. Applying these results, we devised a new parallel layout allocation strategy (PLAS) which minimizes I/O hotspots, and approaches the theoretical best case for parallel I/O throughput. Our I/O performance analysis and processor allocation strategy are applicable to a wide range of contemporary and emerging high-performance computing systems.