Lazy direct-to-cache transfer during receive operations in a message passing environment

Authors:
Farshad Khunjush;Nikitas J. Dimopoulos
Affiliations:
University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, Canada
Venue:
Proceedings of the 3rd conference on Computing frontiers
Year:
2006

Citing 16
Cited 3

Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Construction of staples in lattice gauge theory on a parallel computer

Parallel Computing
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems

IEEE Transactions on Parallel and Distributed Systems
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs

CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Efficient Communication Using Message Prediction for Cluster Multiprocessors

CANPC '00 Proceedings of the 4th International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Architectural Extensions to Support Efficient Communication Using Message Prediction

HPCS '02 Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Direct Cache Access for High Bandwidth Network I/O

Proceedings of the 32nd annual international symposium on Computer Architecture
Performance Analysis of System Overheads in TCP/IP Workloads

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Hiding message delivery and reducing memory access latency by providing direct-to-cache transfer during receive operations in a message passing environment

MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Zero-copy TCP in Solaris

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
High-performance local area communication with fast sockets

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference

Hiding message delivery latency using Direct-to-Cache-Transfer techniques in message passing environments

Microprocessors & Microsystems
Single-port and multi-port collective communication operations on single and dual Cell BE processor systems

International Journal of Communication Networks and Distributed Systems
Comparing direct-to-cache transfer policies to TCP/IP and M-VIA during receive operations in MPI environments

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The focus of this work is on techniques that promise to reduce the message delivery latency in message passing interface (MPI) environments. The main contributors to message delivery latency in message passing environments are the copying operations needed to transfer and bind a received message to the consuming process/thread. To reduce this copying overhead and to reach toward finer granularity, we introduce architectural extensions comprising of a specialized network cache and instructions to manage the operations of this extension. In this work we study the caching environment and evaluate a new technique called Lazy Direct-to-Cache Transfer (DTCT). Our simulations show that messages can be bound and kept into a network cache where they persist long enough to be consumed. We also demonstrate that lazy DTCT provides a significant reduction in the access latency for I/O intensive environments such as message passing configurations and SMPs without polluting the data cache.