Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Lazy release consistency for hardware-coherent multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
MGS: a multigrain shared memory system
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Understanding application performance on shared virtual memory systems
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
VM-based shared memory on low-latency, remote-memory-access networks
Proceedings of the 24th annual international symposium on Computer architecture
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
Design and Implementation of Virtual Memory-Mapped Communication on Myrinet
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Fast Interrupt Priority Management in Operating System Kernels
USENIX Microkernels and Other Kernel Architectures Symposium
Improving Release-Consistent Shared Virtual Memory using Automatic Update
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Design Issues and Tradeoffs for Write Buffers
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
Effect of Communication Latency, Overhead, and Bandwidth on a Cluster
Effect of Communication Latency, Overhead, and Bandwidth on a Cluster
Monitoring shared virtual memory performance on a Myrinet-based PC cluster
ICS '98 Proceedings of the 12th international conference on Supercomputing
Evaluation of hardware write propagation support for next-generation shared virtual memory clusters
ICS '98 Proceedings of the 12th international conference on Supercomputing
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Accelerating shared virtual memory via general-purpose network interface support
ACM Transactions on Computer Systems (TOCS)
ESP: a language for programmable devices
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
User-space communication: a quantitative study
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Removing the overhead from software-based shared memory
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Shared Virtual Memory Clusters with Next-Generation Interconnection Networks and Wide Compute Nodes
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation
IEEE Transactions on Computers
Addressing a workload characterization study to the design of consistency protocols
The Journal of Supercomputing
Hi-index | 0.00 |
Recently there has been a lot of effort in providing cost-effective Shared Memory systems by employing software only solutions on clusters of high-end workstations coupled with high-bandwidth, low-latency commodity networks. Much of the work so far has focused on improving protocols, and there has been some work on restructuring applications to perform better on SVM systems. The result of this progress has been the promise for good performance on a range of applications at least in the 16-32 processor range. New system area networks and network interfaces provide significantly lower overhead, lower latency and higher bandwidth communication in clusters, inexpensive SMPs have become common as the nodes of these clusters, and SVM protocols are now quite mature. With this progress, it is now useful to examine what are the important system bottlenecks that stand in the way of effective parallel performance; in particular, which parameters of the communication architecture are most important to improve further relative to processor speed, which ones are already adequate on modern systems for most applications, and how will this change with technology in the future. Such information can assist system designers in determining where to focus their energies in improving performance, and users in determining what system characteristics are appropriate for their applications.We find that the most important system cost to improve is the overhead of generating and delivering interrupts. Improving network interface (and I/O bus) bandwidth relative to processor speed helps some bandwidth-bound applications, but currently available ratios of bandwidth to processor speed are already adequate for many others. Surprisingly, neither the processor overhead for handling messages nor the occupancy of the communication interface in preparing and pushing packets through the network appear to require much improvement.