Threads and input/output in the synthesis kernal
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
M-structures: extending a parallel, non-strict, functional language with state
Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
A methodology for implementing highly concurrent data objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance studies of Id on the Monsoon dataflow system
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Memory consistency models for shared-memory multiprocessors
Memory consistency models for shared-memory multiprocessors
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
ICS '90 Proceedings of the 4th international conference on Supercomputing
Synchronization transformations for parallel computing
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Empirical studies of competitve spinning for a shared-memory multiprocessor
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Symbolic execution and program testing
Communications of the ACM
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Eliminating synchronization bottlenecks in object-based programs using adaptive replication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Hi-index | 0.00 |
As shared-memory multiprocessors become the dominant commodity source of computation, parallelizing compilers must support mainstream computations that manipulate irregular, pointer-based data structures such as lists, trees and graphs, Our experience with a parallelizing compiler for this class of applications shows that their synchronization requirements differ significantly from those of traditional parallel computations. Instead of coarse-grain barrier synchronization, irregular computations require synchronization primitives that support efficient fine-grain atomic operations.The standard implementation mechanism for atomic operations uses mutual exclusion locks. But the overhead of acquiring and releasing locks can reduce the performance. Locks can also consume significant amourtts of memory. Optimistic synchronization primitives such as load linked/stor conditional are an attractive alternative. They require no additional memory and eliminate the use of heavyweight blocking synchronization constructs.This paper presents our experience using optimistic synchronization to implement fine-grain atomic operations in the context of a parallelizing compiler for irregular object-based programs. We have implemented two versions of the compiler. One version generates code that uses mutual exclusion locks to make operations execute atomically. The other version uses optimistic synchronization. This paper presents the first published algorithm that enables compilers to automatically generate optimistically synchronized parallel code. The presented experimental results indicate that optimistic synchronization is clearly the superior choice for our set of applications. Our results show that it can significantly reduce the memory consumption and improve the overall performance.