Pthreads programming
Thread migration and its applications in distributed shared memory systems
Journal of Systems and Software
Adaptive parallelization techniques in global weather models
Parallel Computing
ACM Computing Surveys (CSUR)
Linkers and Loaders
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Compile/Run-Time Support for Thread Migration
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance evaluation of adaptive MPI
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Achieving high performance on extremely large parallel machines: performance prediction and load balancing
Software—Practice & Experience
The ideal HPC programming language
Communications of the ACM
Techniques supporting threadprivate in OpenMP
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
MPIActor - A Multicore-Architecture Adaptive and Thread-Based MPI Program Accelerator
HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
Thread-local storage extension to support thread-based MPI/OpenMP applications
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Hi-index | 0.00 |
Processor virtualization is a technique in which a programmer divides a computation into many entities, which are mapped to the available processors. The number of these entities, referred to as virtual processors, is typically larger than the number of physical processors. For an MPI program, the user decomposes the computation into more MPI tasks than physical processors. This approach allows overlapping computation and communication, and enables load balancing. User-level threads are often used to implement these virtual processors because they are generally faster to create, manage and migrate than heavy processes or kernel threads. However, these threads present issues concerning private data because they break the private address space assumption typically made by MPI programs. In this paper, we propose a new approach to privatize data in user-level threads. This approach is based on thread-local storage (TLS), which is often used by kernel threads. We apply this technique so that MPI programs can be executed in a virtualized environment while preserving their original semantics. We show that this alternative has a more efficient context switch and lower migration cost and is simpler to implement than other approaches.