Threads and input/output in the synthesis kernal
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
A methodology for implementing highly concurrent data objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance studies of Id on the Monsoon dataflow system
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPARC architecture manual (version 9)
The SPARC architecture manual (version 9)
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory consistency models for shared-memory multiprocessors
Memory consistency models for shared-memory multiprocessors
The synergy between non-blocking synchronization and operating system structure
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
ICS '90 Proceedings of the 4th international conference on Supercomputing
Synchronization transformations for parallel computing
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Commutativity analysis: a new analysis technique for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Lock coarsening: eliminating lock overhead in automatically parallelized object-based programs
Journal of Parallel and Distributed Computing
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback
ACM Transactions on Computer Systems (TOCS)
Symbolic execution and program testing
Communications of the ACM
Multiple Reservations and the Oklahoma Update
IEEE Parallel & Distributed Technology: Systems & Technology
M-Structures: Extending a Parallel, Non-strict, Functional Language with State
Proceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Pointer analysis for structured parallel programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Speculative synchronization: applying thread-level speculation to explicitly parallel applications
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Eliminating synchronization bottlenecks using adaptive replication
ACM Transactions on Programming Languages and Systems (TOPLAS)
ECOOP '02 Proceedings of the 16th European Conference on Object-Oriented Programming
Analysis of Multithreaded Programs
SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Even Better DCAS-Based Concurrent Deques
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Stabilizers: a modular checkpointing abstraction for concurrent functional programs
Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
Modular Checkpointing for Atomicity
Electronic Notes in Theoretical Computer Science (ENTCS)
Using early phase termination to eliminate load imbalances at barrier synchronization points
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Journal of Parallel and Distributed Computing
A (condensed) parametric study of optimistic computation in wide-area, distributed environments
Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Synchronization optimizations for efficient execution on multi-cores
Proceedings of the 23rd international conference on Supercomputing
Adaptive locks: Combining transactions and locks for efficient concurrency
Journal of Parallel and Distributed Computing
Lightweight checkpointing for concurrent ml
Journal of Functional Programming
Allocating memory in a lock-free manner
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Hi-index | 0.00 |
This article presents our experience using optimistic synchronization to implement fine-grain atomic operations in the context of a parallelizing compiler for irregular, object-based computations. Our experience shows that the synchronization requirements of these programs differ significantly from those of traditional parallel computations, which use loop nests to access dense matrices using affine access functions. In addition to coarse-grain barrier synchronization, our irregular computations require synchronization primitives that support efficient fine-grain atomic operations. The standard implementation mechanism for atomic operations uses mutual exclusion locks. But the overhead of acquiring and releasing locks can reduce the performance. Locks can also consume significant amounts of memory. Optimistic synchronization primitives such as loud-linked/store conditional are an attractive alternative. They require no additional memory and eliminate the use of heavyweight blocking synchronization constructs. We evaluate the effectiveness of optimistic synchronization by comparing experimental results from two versions of a parallelizing compiler for irregular, object-based computations. One version generates code that uses mutual exclusion locks to make operations execute atomically. The other version generates code that uses mutual exclusion locks to make operations execute atomically. The other version uses optimistic synchronization. We used this compiler to automatically parallelize three irregular, object-based benchmark applications of interest to the scientific and engineering computation community. The presented experimental results indicate that the use of optimistic synchronization in this context can significantly reduce the memory consumption and improve the overall performance.