MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems
IEEE Transactions on Parallel and Distributed Systems
CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Efficient Communication Using Message Prediction for Cluster Multiprocessors
CANPC '00 Proceedings of the 4th International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Architectural Extensions to Support Efficient Communication Using Message Prediction
HPCS '02 Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Performance Analysis of System Overheads in TCP/IP Workloads
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Lazy direct-to-cache transfer during receive operations in a message passing environment
Proceedings of the 3rd conference on Computing frontiers
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
High-performance local area communication with fast sockets
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Microprocessors & Microsystems
Cache injection for parallel applications
Proceedings of the 20th international symposium on High performance distributed computing
Hi-index | 0.00 |
Themain contributors tomessage delivery latency in message passing environments are the copying operations needed to transfer and bind a received message to the consuming process/thread. To reduce this copying overhead, we introduce architectural extensions comprising a specialized network cache and instructions. In this work, we study the possible overhead and cache pollution introduced through the operating system and the communications stack as exemplified by Linux, TCP/IP and MVIA. We introduce this overhead in our simulation environment and study its effects on our proposed extensions. Ultimately, we have been able to compare the performance achieved by an application running on a system incorporating our extensions with the performance of the same application running on a standard system. The results show that our proposed approach can improve the performance of MPI applications by 10%to 20%.