Compiler and software distributed shared memory support for irregular applications

Authors:
Honghui Lu;Alan L. Cox;Sandhya Dwarkadas;Ramakrishnan Rajamony;Willy Zwaenepoel
Affiliations:
Department of Electrical and Computer Engg, Rice University;Department of Computer Science, Rice University;Department of Computer Science, University of Rochester;Department of Electrical and Computer Engg, Rice University;Department of Computer Science, Department of Electrical and Computer Engg, Rice University
Venue:
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1997

Citing 16
Cited 24

Analysis of interprocedural side effects in a parallel programming environment

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Efficient support for irregular applications on distributed-memory machines

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Techniques for reducing consistency-related communication in distributed shared-memory systems

ACM Transactions on Computer Systems (TOCS)
Message passing versus distributed shared memory on networks of workstations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Interprocedural compilation of irregular applications for distributed memory machines

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Index array flattening through program transformation

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
An integrated compile-time/run-time software distributed shared memory system

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems
Compiler Analysis for Irregular Problems in Fortran D

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing

Performance evaluation of the Orca shared-object system

ACM Transactions on Computer Systems (TOCS)
Prefetching on the Cray-T3E

ICS '98 Proceedings of the 12th international conference on Supercomputing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
Evaluating the impact of memory system performance on software prefetching and locality optimizations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Accurate data redistribution cost estimation in software distributed shared memory systems

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Design issues for a high-performance distributed shared memory on symmetrical multiprocessor clusters

Cluster Computing
OpenMP on networks of workstations for software DSMs

Journal of Computer Science and Technology
Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compilation and Runtime-Optimizations for Software Distributed Shared Memory

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time Synchronization Optimizations for Software DSMs

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Efficient support for pipelining in software distributed shared memory systems

Real-time system security
CAS-DSM: a compiler assisted software distributed shared memory

International Journal of Parallel Programming
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
Combined compile-time and runtime-driven, pro-active data movement in software DSM systems

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
A methodology for detailed performance modeling of reduction computations on SMP machines

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Optimizing irregular shared-memory applications for distributed-memory systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
CRAUL: Compiler and run-time integration for adaptation under load[1]This work was supported in part by NSF grants CDA-9401142, CCR-9702466, and CCR-9705594; and an external research grant from Compaq.

Scientific Programming
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimizing irregular shared-memory applications for clusters

Proceedings of the 22nd annual international conference on Supercomputing
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
Runtime address space computation for SDSM systems

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Compiler and runtime support for shared memory parallelization of data mining algorithms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the use of a software distributed shared memory (DSM) layer to support irregular computations on distributed memory machines. Software DSM supports irregular computation through demand fetching of data in response to memory access faults. With the addition of a very limited form of compiler support, namely the identification of the section of the indirection array accessed by each processor, many of these on-demand page fetches can be aggregated into a single message, and prefetched prior to the access fault.We have measured the performance of this approach for two irregular applications, moldyn and nbf, using the Tread-Marks DSM system on an 8-processor IBM SP2. We find that it has similar performance to the inspector-executor method supported by the CHAOS run-time library, while requiring much simpler compile-time support. For moldyn, it is up to 23% faster than CHAOS, depending on the input problem's characteristics; and for nbf, it is no worse than 14% slower. If we include the execution time of the inspector, the software DSM-based approach is always faster than CHAOS. The advantage of this approach increases as the frequency of changes to the indirection array increases. The disadvantage of this approach is the potential for false sharing overhead when the data set is small or has poor spatial locality.