A HyperTransport 3 Physical Layer Interface for FPGAs
ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
A practical way to extend shared memory support beyond a motherboard at low cost
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
An efficient implementation of GPU virtualization in high performance clusters
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
A new degree of freedom for memory allocation in clusters
Cluster Computing
Rethinking network stack design with memory snapshots
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Hi-index | 0.00 |
This paper presents a novel stateless, virtualized communication engine for sub-microsecond latency. Using a Field-Programmable-Gate-Array (FPGA) based prototype we show a latency of 970 ns between two machines with our Virtualized Engine for Low Overhead (VELO). The FPGA device is directly connected to the CPUs by a HyperTransport link. The described hardware architecture is optimized for small messages and avoids the overhead typically found with Direct-Memory Access (DMA) controlled transfers. The stateless approach allows to use the hardware unit directly from many threads and processes simultaneously. It provides a secure user level communication with an extremely optimized start-up phase. Micro benchmarks results are reported both based on proprietary API and OpenMPI basis.