Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Dual-level parallelism for deterministic and stochastic CFD problems
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
LiMIC: Support for High-Performance MPI Intra-node Communication on Linux Cluster
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Data Transfers between Processes in an SMP System: Performance Study and Application to MPI
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Implementation and shared-memory evaluation of MPICH2 over the nemesis communication subsystem
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Architecture of the Component Collective Messaging Interface
International Journal of High Performance Computing Applications
Kernel-Assisted MPI Collective Communication among Many-core Clusters
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework
Journal of Parallel and Distributed Computing
Proposing a new task model towards many-core architecture
Proceedings of the First International Workshop on Many-core Embedded Systems
Hi-index | 0.00 |
This paper describes SMARTMAP, an operating system technique that implements fixed offset virtual memory addressing. SMARTMAP allows the application processes on a multi-core processor to directly access each other's memory without the overhead of kernel involvement. When used to implement MPI, SMARTMAP eliminates all extraneous memory-to-memory copies imposed by UNIX-based shared memory strategies. In addition, SMARTMAP can easily support operations that UNIX-based shared memory cannot, such as direct, in-place MPI reduction operations and one-sided get/put operations. We have implemented SMARTMAP in the Catamount lightweight kernel for the Cray XT and modified MPI and Cray SHMEM libraries to use it. Micro-benchmark performance results show that SMARTMAP allows for significant improvements in latency, bandwidth, and small message rate on a quad-core processor.