Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Authors:
Angelos Bilas;Courtney R. Gibson;Reza Azimi;Rosalia Christodoulopoulou;Peter Jamieson
Affiliations:
Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada M5S 3G4;Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada M5S 3G4;Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada M5S 3G4;Department of Computer Science, University of Toronto, Toronto, ON, Canada M5S 3G4;Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada M5S 3G4
Venue:
Cluster Computing
Year:
2003

Citing 24
Cited 0

FFTs in external or hierarchical memory

The Journal of Supercomputing
A comparison of sorting algorithms for the connection machine CM-2

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Volume rendering on scalable shared-memory MIMD architectures

VVS '92 Proceedings of the 1992 workshop on Volume visualization
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
MGS: a multigrain shared memory system

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Understanding application performance on shared virtual memory systems

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Hiding communication latency and coherence overhead in software DSMs

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
VM-based shared memory on low-latency, remote-memory-access networks

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Cashmere-2L: software coherent shared memory on a clustered remote-write network

Proceedings of the sixteenth ACM symposium on Operating systems principles
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application scaling under shared virtual memory on a cluster of SMPs

ICS '99 Proceedings of the 13th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Multigrain shared memory

ACM Transactions on Computer Systems (TOCS)
The effects of communication parameters on end performance of shared virtual memory clusters

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Using memory-mapped network interfaces to improve the performance of distributed shared memory

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Improving Release-Consistent Shared Virtual Memory using Automatic Update

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Home-Based SVM Protocols for SMP Clusters: Design and Performance

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The NUMAchine Multiprocessor

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently much effort has been spent on providing a shared address space abstraction on clusters of small-scale symmetric multiprocessors. However, advances in technology will soon make it possible to construct these clusters with larger-scale cc-NUMA nodes, connected with non-coherent networks that offer latencies and bandwidth comparable to interconnection networks used in hardware cache-coherent systems. The shared memory abstraction can be provided on these systems in software across nodes and hardware within nodes.Recent simulation results have demonstrated that certain features of modern system area networks can be used to greatly reduce shared virtual memory (SVM) overheads [5,19]. In this work we leverage these results and we use detailed system emulation to investigate building future software shared memory clusters. We use an existing, large-scale hardware cache-coherent system with 64 processors to emulate a complete future cluster. We port our existing infrastructure (communication layer and shared memory protocol) on this system and study the behavior of a set of real applications. We present results for both 32- and 64-processor system configurations.We find that: (i) System emulation is invaluable in quantifying potential benefits from changes in the technology of commodity components. More importantly, it reveals potential problems in future systems that are easily overlooked in simulation studies. Thus, system emulation should be used along with other modeling techniques (e.g., simulation, implementation) to investigate future trends. (ii) Our work shows that current SVM protocols can only partially take advantage of faster interconnects and wider nodes due to operating system and architectural implications. We quantify the related issues and identify the areas where more research is required for future SVM clusters.