Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A distributed implementation of shared virtual memory with strong and weak coherence
EDMCC2 Proceedings of the 2nd European conference on Distributed memory computing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A comparison of two paradigms for distributed shared memory
Software—Practice & Experience
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Using processor affinity in loop scheduling on shared-memory multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Reducing memory access delays in large-scale shared-memory multiprocessors
Reducing memory access delays in large-scale shared-memory multiprocessors
Supercomputer performance evaluation and the Perfect Benchmarks
ICS '90 Proceedings of the 4th international conference on Supercomputing
Compiler-directed data prefetching in multiprocessors with memory hierarchies
ICS '90 Proceedings of the 4th international conference on Supercomputing
Adaptive software cache management for distributed shared memory architectures
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Fortran-S: a Fortran interface for shared virtual memory architectures
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiler cache optimizations for banded matrix problems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
Evaluating the impact of advanced memory systems on compiler-parallelized codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving single-process performance with multithreaded processors
ICS '96 Proceedings of the 10th international conference on Supercomputing
Characterizing the Memory Behavior of Compiler-Parallelized Applications
IEEE Transactions on Parallel and Distributed Systems
Tradeoffs between false sharing and aggregation in software distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs
International Journal of Parallel Programming
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
ARS: an adaptive runtime system for locality optimization
Future Generation Computer Systems - Tools for program development and analysis
Compile-time Synchronization Optimizations for Software DSMs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Structuring the unstructured middle with chunk computing
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Hi-index | 0.01 |
In large-scale multiprocessors, whether loosely or tightly coupled, some memory is cheaper to access than other memory. Because direct management of memory on these machines is quite burdensome to the programmer, much research effort has been directed toward providing a shared virtual memory (SVM) interface. Clearly, the success of this endeavor depends heavily on the efficiency of page management strategies. To date, this has been primarily the responsibility of the operating system, and secondarily that of the hardware. Unfortunately, delaying page management decisions entirely until run time can lead to an unacceptable loss of efficiency, due to poor data layout and memory reference patterns that are fixed by the end of compile time. For this reason, programmer assistance has been occasionally solicited. However, this disrupts the SVM abstraction. Moreover, many of these problems may be addressable at the compiler level instead. This is especially promising for array-based languages where compiler-based, analytical technology is most mature. Surprisingly, this possibility is largely unexplored. In this paper, we discuss the issue of compiler involvement in areas ranging from loop transformations and scheduling issues, to data layout strategies, page placement decisions, access pattern analysis, and use of run time system directives.