FFTs in external or hierarchical memory
The Journal of Supercomputing
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
MGS: a multigrain shared memory system
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Understanding application performance on shared virtual memory systems
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Hiding communication latency and coherence overhead in software DSMs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
VM-based shared memory on low-latency, remote-memory-access networks
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application scaling under shared virtual memory on a cluster of SMPs
ICS '99 Proceedings of the 13th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Transactions on Computer Systems (TOCS)
The effects of communication parameters on end performance of shared virtual memory clusters
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Using memory-mapped network interfaces to improve the performance of distributed shared memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Improving Release-Consistent Shared Virtual Memory using Automatic Update
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Home-Based SVM Protocols for SMP Clusters: Design and Performance
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Hi-index | 0.00 |
Recently much effort has been spent on providing a shared address space abstraction on clusters of small-scale symmetric multiprocessors. However, advances in technology will soon make it possible to construct these clusters with larger-scale cc-NUMA nodes, connected with non-coherent networks that offer latencies and bandwidth comparable to interconnection networks used in hardware cache-coherent systems. The shared memory abstraction can be provided on these systems in software across nodes and hardware within nodes.Recent simulation results have demonstrated that certain features of modern system area networks can be used to greatly reduce shared virtual memory (SVM) overheads [5,19]. In this work we leverage these results and we use detailed system emulation to investigate building future software shared memory clusters. We use an existing, large-scale hardware cache-coherent system with 64 processors to emulate a complete future cluster. We port our existing infrastructure (communication layer and shared memory protocol) on this system and study the behavior of a set of real applications. We present results for both 32- and 64-processor system configurations.We find that: (i) System emulation is invaluable in quantifying potential benefits from changes in the technology of commodity components. More importantly, it reveals potential problems in future systems that are easily overlooked in simulation studies. Thus, system emulation should be used along with other modeling techniques (e.g., simulation, implementation) to investigate future trends. (ii) Our work shows that current SVM protocols can only partially take advantage of faster interconnects and wider nodes due to operating system and architectural implications. We quantify the related issues and identify the areas where more research is required for future SVM clusters.