SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The Amber system: parallel programming on a network of multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Multi-level shared caching techniques for scalability in VMP-M/C
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Delayed consistency and its effects on the miss rate of parallel programs
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance evaluation of hybrid hardware and software distributed shared memory protocols
ICS '94 Proceedings of the 8th international conference on Supercomputing
Software versus hardware shared-memory implementation: a case study
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Design of the Munin distributed shared memory system
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
MGS: a multigrain shared memory system
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Lazy release consistency for distributed shared memory
Lazy release consistency for distributed shared memory
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Portable Programs for Parallel Processors
Portable Programs for Parallel Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocessors
The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocessors
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Design and performance of the Shasta distributed shared memory protocol
ICS '97 Proceedings of the 11th international conference on Supercomputing
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
VM-based shared memory on low-latency, remote-memory-access networks
Proceedings of the 24th annual international symposium on Computer architecture
Towards transparent and efficient software distributed shared memory
Proceedings of the sixteenth ACM symposium on Operating systems principles
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
Evaluation of hardware write propagation support for next-generation shared virtual memory clusters
ICS '98 Proceedings of the 12th international conference on Supercomputing
Predicting the performance of distributed virtual shared-memory applications
IBM Systems Journal
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The scalability of multigrain systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Comparative study of page-based and segment-based software DSM through compiler optimization
Proceedings of the 14th international conference on Supercomputing
A Programming Methodology for Dual-Tier Multicomputers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
ACM Transactions on Computer Systems (TOCS)
Accelerating shared virtual memory via general-purpose network interface support
ACM Transactions on Computer Systems (TOCS)
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Removing the overhead from software-based shared memory
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Shared Virtual Memory Clusters with Next-Generation Interconnection Networks and Wide Compute Nodes
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
A New Home-Based Software DSM Protocol for SMP Clusters
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Evaluation of Compiler-Assisted Software DSM Schemes for a Workstation Cluster
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Journal of Parallel and Distributed Computing
Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs
Journal of Parallel and Distributed Computing
IEEE Transactions on Parallel and Distributed Systems
Shared memory computing on clusters with symmetric multiprocessors and system area networks
ACM Transactions on Computer Systems (TOCS)
A Transparent Distributed Shared Memory for Clustered Symmetric Multiprocessors
The Journal of Supercomputing
Brazos: a third generation DSM system
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Exploiting locality: a flexible DSM approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
One potentially attractive way to build large-scale shared-memory machines is to use small-scale to medium-scale shared-memory machines as clusters that are interconnected with an off-the-shelf network. To create a shared-memory programming environment across the clusters, it is possible to use a virtual shared-memory software layer. Because of the low latency and high bandwidth of the interconnect available within each cluster, there are clear advantages in making the clusters as large as possible. The critical question then becomes whether the latency and bandwidth of the top-level network and the software system are sufficient to support the communication demands generated by the clusters.To explore these questions, we have built an aggressive kernel implementation of a virtual shared-memory system using SGI multiprocessors and 100Mbyte/sec HIPPI interconnects. The system obtains speedups on 32 processors (four nodes, eight processors per node plus additional reserved protocol processors) that range from 6.9 on the communication-intensive FFT program to 21.6 on Ocean (both from the SPLASH 2 suite). In general, clustering is effective in reducing internode miss rates, but as the cluster size increases, increases in the remote latency, mostly due to increased TLB synchronization cost, offset the advantages. For communication-intensive applications, such as FFT, the overhead of sending out network requests, the limited network bandwidth, and the long network latency prevent the achievement of good performance. Overall, this approach still appears promising, but our results indicate that large low latency networks may be needed to make cluster-based virtual shared-memory machines broadly useful as large-scale shared-memory multiprocessors.