Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Early experience with message-passing on the SHRIMP multicomputer
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
iWarp: anatomy of a parallel computing system
iWarp: anatomy of a parallel computing system
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ATOLL: A High-Performance Communication Device for Massively Parallel Systems
APDC '97 Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Communication Optimizations for Fine-Grained UPC Applications
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
PathScale InfiniPath: A First Look
HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
A versatile, low latency HyperTransport core
Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
A preliminary analysis of the infinipath and XD1 network interfaces
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Cluster computing is still the most cost-effective solution to meet the increasing demand for computing power. Clusters are typically based on commodity computing hardware with specialized interconnection networks (IN). These cluster interconnects differ from commodity networks by higher bandwidth, lower latency, lower CPU utilization and improved scalability. But even with these sophisticated INs the latency of a message transfer between two nodes is still decades higher than a local memory access. Especially for fine grain communication the latency of a message transfer is crucial. An analysis of the latency shows that the main component originates from the I/O system. The goal of this paper is to present a new mechanism called Ultra Low Latency Message Transfer (ULTRA), which allows message passing with lowest latencies possible. Beside the usage of well-known techniques like User-Level Communication this work focuses on improving the Network Interface by an optimized and most efficient usage of the I/O system. The ULTRA mechanism and architecture presented here show a topmost optimized approach for low latencies, limited only by the used standard I/O system. With it a much closer coupling of the cluster nodes is possible and fine grain communication schemes are more suitable for cluster computing.