Memory coherence in shared virtual memory systems
PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
An in-cache address translation mechanism
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Memory access patterns of parallel scientific programs
SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Cache evaluation and the impact of workload choice
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Virtual Memory for the Sprite Operating System
Virtual Memory for the Sprite Operating System
Reference history, page size, and migration daemons in local/remote architectures
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
The robustness of NUMA memory management
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
An analysis of dynamic page placement on a NUMA multiprocessor
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Pipelined Data Parallel Algorithms-II: Design
IEEE Transactions on Parallel and Distributed Systems
Evaluation of NUMA Memory Management Through Modeling and Measurements
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
We conjecture that a paged memory with page migration by the operating system may be an effective system environment for a local/remote shared memory architecture executing a single parallel computation. Implementing a paged memory in such an architecture raises several issues with respect to page table management. These issues include page table placement, page table replication level, and page table storage overhead. We discuss these issues, propose alternative solutions, and present an experimental evaluation of the solutions.The experiments were conducted using software implemented page tables on a 32-node BBN Butterfly. The experiments have investigated the case of a single shared-memory parallel computation with one user process per processor. The implementation captures the costs of page table entry locking and reference information updating. Each user process has a copy of the computation's code and non-shared variables in local memory. Only shared data references use the page tables. A separate processor has a migration daemon that periodically unblocks itself and examines the page tables to make policy decisions concerning page migration.The conclusions drawn include that: 1) a fully replicated page-indexed page table significantly reduces network, memory, and lock contention in comparison to a single copy, 2) a fully replicated page-indexed page table faces a severe memory utilization problem in large-scale architectures, 3) a proposed approach based on inverted page tables appears to be a promising alternative to a fully replicated page-indexed page table.