CAS-DSM: a compiler assisted software distributed shared memory

Authors:
N. P. Manoj;K. V. Manjunath;R. Govindarajan
Affiliations:
Hewlett-Packard India Software Operations, 29 Cunningham Road, Bangalore 560 052, India;Electrical Engineering and Computer Science, University of Michigan Ann Arbor, Michigan;Department of Computer Science and Automation, Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
Venue:
International Journal of Parallel Programming
Year:
2004

Citing 36
Cited 5

Memory access buffering in multiprocessors

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Linda and Friends

Computer
PVM: a framework for parallel distributed computing

Concurrency: Practice and Experience
Munin: distributed shared memory based on type-specific memory coherence

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Scalable coherent interface

Computer
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
A comparison of sorting algorithms for the connection machine CM-2

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
The Stanford Dash Multiprocessor

Computer
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
DDM: A Cache-Only Memory Architecture

Computer
KRS1: high performance and ease of programming, no longer an oxymoron

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
SP2 system architecture

IBM Systems Journal
The SP2 high-performance switch

IBM Systems Journal
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Message passing versus distributed shared memory on networks of workstations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
MGS: a multigrain shared memory system

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler and software distributed shared memory support for irregular applications

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Tapeworm: high-level abstractions of shared accesses

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
MultiView and Millipage — fine-grain sharing in page-based DSMs

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Responsiveness without interrupts

ICS '99 Proceedings of the 13th international conference on Supercomputing
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)

Journal of Parallel and Distributed Computing
An efficient logging and recovery scheme for lazy release consistent distributed shared memory systems

Future Generation Computer Systems
OpenMP on networks of workstations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Distributed Shared Memory: Concepts and Systems

IEEE Parallel & Distributed Technology: Systems & Technology
Shared Memory Consistency Models: A Tutorial

Computer
Multi-threading and remote latency in software DSMs

ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)
Compile-time Synchronization Optimizations for Software DSMs

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocessors

The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocessors
An Efficient Shared Memory Layer for Distributed Memory Machines.

An Efficient Shared Memory Layer for Distributed Memory Machines.

DSiMCluster: A Simulation Model for Efficient Memory Analysis Experiments of DSM Clusters

Simulation
Optimizing a shared virtual memory system for a heterogeneous CPU-accelerator platform

ACM SIGOPS Operating Systems Review
The data diffusion space for parallel computing in clusters

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Barrier elimination based on access dependency analysis for OpenMP

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional software Distributed Shared Memory (DSM) systems rely on the virtual memory management mechanisms to detect accesses to shared memory locations and maintain their consistency. The resulting involvement of the OS (kernel) and the associated overhead which is significant, can be avoided by careful compile time analysis and code instrumentation. In this paper, we propose such a Compiler Assisted Software support approach (CAS-DSM). In the CAS-DSM implementation, the involvement of the OS kernel is avoided by instrumenting the application code at the source level. The overhead caused by the execution of the instrumented code is reduced through several aggressive compile time optimizations. Finally, we also address the issue of reducing certain overheads in polling-based implementation of receiving asynchronous messages. We used SUIF, a public domain compiler tool, to implement compile time analysis, instrumentation and optimizations. We modified CVM, a publicly available software DSM to support the instrumentation inserted by the compiler. Detailed performance evaluation of CAS-DSM is reported using a set of Splash/Splash2 parallel application benchmarks on a distributed memory IBM SP-2 machine. CAS-DSM achieved moderate to good performance improvements for most of the applications compared to the original CVM implementation. Reducing the overheads in polling-based implementation improves the performance of CAS-DSM significantly resulting in an overall improvement of 12-52% over the original CVM implementation.