Construction of staples in lattice gauge theory on a parallel computer
Parallel Computing
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems
IEEE Transactions on Parallel and Distributed Systems
The Virtual Interface Architecture
IEEE Micro
CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Efficient Communication Using Message Prediction for Cluster Multiprocessors
CANPC '00 Proceedings of the 4th International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Architectural Extensions to Support Efficient Communication Using Message Prediction
HPCS '02 Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications
Communications of the ACM - Voting systems
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Performance Analysis of System Overheads in TCP/IP Workloads
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Lazy direct-to-cache transfer during receive operations in a message passing environment
Proceedings of the 3rd conference on Computing frontiers
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
High-performance local area communication with fast sockets
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
Communication overhead is the key obstacle to reaching hardware performance limits. The majority is associated with software overhead, a significant portion of which is attributed to message copying. To reduce this copying overhead, we have devised techniques that do not require to copy a received message in order for it to be bound to its final destination. Rather, a late-binding mechanism, which involves address translation and a dedicated cache, facilitates fast access to received messages by the consuming process/thread. We have introduced two policies namely Direct to Cache Transfer (DTCT) and lazy DTCT that determine whether a message after it is bound needs to be transferred into the data cache. We have studied the proposed methods in simulation and have shown their effectiveness in reducing access times to message payloads by the consuming process.