Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Using predictive prefetching to improve World Wide Web latency
ACM SIGCOMM Computer Communication Review
Journal of Parallel and Distributed Computing
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Comparative Evaluation of Latency Tolerance Techniques for Software Distributed Shared Memory
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Optimizing OpenMP programs on software distributed shared memory systems
International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
A practical OpenMP compiler for system on chips
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Implementing an OpenMP execution environment on InfiniBand clusters
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Overcoming performance bottlenecks in using OpenMP on SMP clusters
Parallel Computing
Automatic Prefetching with Binary Code Rewriting in Object-Based DSMs
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
FELI: HW/SW support for on-chip distributed shared memory in multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Esodyp+: prefetching in the Jackal software DSM
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
Costly page migration is a major obstacle to integrating OpenMP and page-based software distributed shared memory (SDSM) to realize the easy-touse programming paradigm for SMP clusters. To reduce the impact of the page migration overhead on the execution time of an application, the previous researches have mainly focused on reducing the number of page migrations and hiding the page migration overhead by overlapping computation and communication. We propose the 'collective-prefetch' technique, which overlaps page migrations themselves even when the prior approach cannot be effectively applied. Experiments with a communication-intensive application show that our technique reduces the page migration overhead significantly, and the overall execution time was reduced to 57%∼79%.