Analysis of interprocedural side effects in a parallel programming environment
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
GIVE-N-TAKE—a balanced code placement framework
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Techniques for reducing consistency-related communication in distributed shared-memory systems
ACM Transactions on Computer Systems (TOCS)
Message passing versus distributed shared memory on networks of workstations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Interprocedural compilation of irregular applications for distributed memory machines
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Index array flattening through program transformation
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Automatic compiler-inserted I/O prefetching for out-of-core applications
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Compiler Analysis for Irregular Problems in Fortran D
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
ICS '98 Proceedings of the 12th international conference on Supercomputing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
ICS '01 Proceedings of the 15th international conference on Supercomputing
Accurate data redistribution cost estimation in software distributed shared memory systems
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
OpenMP on networks of workstations for software DSMs
Journal of Computer Science and Technology
Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compilation and Runtime-Optimizations for Software Distributed Shared Memory
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time Synchronization Optimizations for Software DSMs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Efficient support for pipelining in software distributed shared memory systems
Real-time system security
CAS-DSM: a compiler assisted software distributed shared memory
International Journal of Parallel Programming
IEEE Transactions on Knowledge and Data Engineering
Combined compile-time and runtime-driven, pro-active data movement in software DSM systems
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
A methodology for detailed performance modeling of reduction computations on SMP machines
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Optimizing irregular shared-memory applications for distributed-memory systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimizing irregular shared-memory applications for clusters
Proceedings of the 22nd annual international conference on Supercomputing
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
Runtime address space computation for SDSM systems
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Compiler and runtime support for shared memory parallelization of data mining algorithms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
We investigate the use of a software distributed shared memory (DSM) layer to support irregular computations on distributed memory machines. Software DSM supports irregular computation through demand fetching of data in response to memory access faults. With the addition of a very limited form of compiler support, namely the identification of the section of the indirection array accessed by each processor, many of these on-demand page fetches can be aggregated into a single message, and prefetched prior to the access fault.We have measured the performance of this approach for two irregular applications, moldyn and nbf, using the Tread-Marks DSM system on an 8-processor IBM SP2. We find that it has similar performance to the inspector-executor method supported by the CHAOS run-time library, while requiring much simpler compile-time support. For moldyn, it is up to 23% faster than CHAOS, depending on the input problem's characteristics; and for nbf, it is no worse than 14% slower. If we include the execution time of the inspector, the software DSM-based approach is always faster than CHAOS. The advantage of this approach increases as the frequency of changes to the indirection array increases. The disadvantage of this approach is the potential for false sharing overhead when the data set is small or has poor spatial locality.