Understanding application performance on shared virtual memory systems

Authors:
Liviu Iftode;Jaswinder Pal Singh;Kai Li
Affiliations:
Department of Computer Science, Princeton University, Princeton, NJ;Department of Computer Science, Princeton University, Princeton, NJ;Department of Computer Science, Princeton University, Princeton, NJ
Venue:
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Year:
1996

Citing 13
Cited 24

Memory coherence in shared virtual memory systems

PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
Analysis of cache invalidation patterns in multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software versus hardware shared-memory implementation: a case study

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Adaptive software cache management for distributed shared memory architectures

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Using memory-mapped network interfaces to improve the performance of distributed shared memory

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Improving Release-Consistent Shared Virtual Memory using Automatic Update

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Comparison of Entry Consistency and Lazy Release Consistency Implementations

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

Scope consistency: a bridge between release consistency and entry consistency

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Ace: linguistic mechanisms for customizable protocols

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
An interaction of coherence protocols and memory consistency models in DSM systems

ACM SIGOPS Operating Systems Review
Monitoring shared virtual memory performance on a Myrinet-based PC cluster

ICS '98 Proceedings of the 12th international conference on Supercomputing
Evaluation of hardware write propagation support for next-generation shared virtual memory clusters

ICS '98 Proceedings of the 12th international conference on Supercomputing
Predicting the performance of distributed virtual shared-memory applications

IBM Systems Journal
UTLB: a mechanism for address translation on network interfaces

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application scaling under shared virtual memory on a cluster of SMPs

ICS '99 Proceedings of the 13th international conference on Supercomputing
Shared virtual memory with automatic update support

ICS '99 Proceedings of the 13th international conference on Supercomputing
BOS is boss: a case for bulk-synchronous object systems

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Accelerating shared virtual memory via general-purpose network interface support

ACM Transactions on Computer Systems (TOCS)
The effects of communication parameters on end performance of shared virtual memory clusters

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
An Effective Logical Cache for a Clustered LRC-Based DSM System

Cluster Computing
Shared Virtual Memory Clusters with Next-Generation Interconnection Networks and Wide Compute Nodes

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
CableS: Thread Control and Memory System Extensions for Shared Virtual Memory Clusters

WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Efficient Categorization of Sharing Patterns in Software DSM Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Parallel Tree Building on a Range of Shared Address Space Multiprocessors: Algorithms and Application Performance

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Cluster Computing
Shared virtual memory clusters: bridging the cost-performance gap between SMPs and hardware DSM systems

Journal of Parallel and Distributed Computing
Addressing a workload characterization study to the design of consistency protocols

The Journal of Supercomputing
A Hopfield neural network based task mapping method

Computer Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics.We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches---Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC)---with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at leant up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.