Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Processor scheduling in shared memory multiprocessors
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The performance of multiprogrammed multiprocessor scheduling algorithms
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The C programming language
ACM Transactions on Programming Languages and Systems (TOPLAS)
Pthreads programming
The Nexus approach to integrating multithreading and communication
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Scheduler-conscious synchronization
ACM Transactions on Computer Systems (TOCS)
Efficient message passing interface (MPI) for parallel computing on clusters of workstations
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computer organization and design (2nd ed.): the hardware/software interface
Computer organization and design (2nd ed.): the hardware/software interface
MPI-SIM: using parallel simulation to evaluate MPI programs
Proceedings of the 30th conference on Winter simulation
Adaptive two-level thread management for fast MPI execution on shared memory machines
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
TPVM: distributed concurrent computing with lightweight processes
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Dynamic Processor Allocation with the Solaris Operating System
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Managing Concurrent Access for Shared Memory Active Messages
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
MPIDC '96 Proceedings of the Second MPI Developers Conference
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects
IEEE Transactions on Parallel and Distributed Systems
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
A Buffered-Mode MPI Implementation for the Cell BETM Processor
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
An MPI-1 Compliant Thread-Based Implementation
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Automatic MPI to AMPI program transformation using photran
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
A synchronous mode MPI implementation on the cell BETM architecture
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Leveraging MPI's one-sided communication interface for shared-memory programming
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Globalizing selectively: shared-memory efficiency with address-space separation
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.02 |
Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared-memory machines map each MPI node to an OS process, which can suffer serious performance degradation in the presence of multiprogramming. This paper studies compile-time and runtime techniques for enhancing performance portability of MPI code running on multiprogrammed shared-memory machines. The proposed techniques allow MPI nodes to be executed safety and efficiently as threads. Compile-time transformation eliminates global and static variables in C code using node-specific data. The runtime support includes an efficient and provably correct communication protocol that uses lock-free data structure and takes advantage of address space sharing among threads. The experiments on SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and that it has significant performance advantages in a multiprogrammed environment.