POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Computer organization & design: the hardware/software interface
Computer organization & design: the hardware/software interface
Software overhead in messaging layers: where does the time go?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Network interface for protected, user-level communication
Network interface for protected, user-level communication
ICS '98 Proceedings of the 12th international conference on Supercomputing
Optimal scheduling for UET/VET-UCT generalized n-dimensional grid task graphs
Journal of Parallel and Distributed Computing
Automatic code generation for executing tiled nested loops onto parallel architectures
Proceedings of the 2002 ACM symposium on Applied computing
On Supernode Transformation with Minimized Total Running Time
IEEE Transactions on Parallel and Distributed Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
VIA over SCI: Consequences of a Zero Copy Implementation and Comparison with VIA over Myrinet
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The SCI Standard and Applications of SCI
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
Hi-index | 0.00 |
This paper describes the performance benefits attained using enhanced network interfaces to achieve low latency communication. We present a novel, pipelined scheduling approach which takes advantage of DMA communication mode, to send data to other nodes, while the CPUs are performing calculations. We also use zero-copy communication through pinned-down physical memory regions, provided by NIC's driver modules. Our testbed concerns the parallel execution of tiled nested loops onto a cluster of SMP nodes with single PCI-SCI NICs inside each node. In order to schedule tiles, we apply a hyperplane-based grouping transformation to the tiled space, so as to group together independent neighboring tiles and assign them to the same SMP node. Experimental evaluation illustrates that memory mapped NICs with enhanced communication features enable the use of a more advanced pipelined (overlapping) schedule, which considerably improves performance, compared to an ordinary blocking schedule, implemented with conventional, CPU and kernel bounded, communication primitives.