Compiler algorithms for synchronization
IEEE Transactions on Computers
Efficient synchronization of multiprocessors with shared memory
ACM Transactions on Programming Languages and Systems (TOPLAS)
An approach to synchronization for parallel computing
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Impact of self-scheduling order on performance on multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
On data synchronization for multiprocessors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Efficient Doacross execution on distributed shared-memory multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A scheme to extract run-time parallelism form sequential loops
ICS '91 Proceedings of the 5th international conference on Supercomputing
Compiler algorithms for event variable synchronization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Removal of redundant dependences in DOACROSS loops with constant dependences
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Experiments with an ocean circulation model on CEDAR
ICS '92 Proceedings of the 6th international conference on Supercomputing
The cedar system and an initial performance study
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Advanced compiler optimizations for sparse computations
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
ICS '94 Proceedings of the 8th international conference on Supercomputing
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Run-time methods for parallelizing partially parallel loops
ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiler techniques for data synchronization in nested parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Coarse-grained speculative execution in shared-memory multiprocessors
ICS '98 Proceedings of the 12th international conference on Supercomputing
The Cedar system and an initial performance study
25 years of the international symposia on Computer architecture (selected papers)
IEEE Transactions on Parallel and Distributed Systems
The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
IEEE Transactions on Parallel and Distributed Systems
Compiler optimizations for scalable parallel systems
IEEE Transactions on Parallel and Distributed Systems
Techniques for speculative run-time parallelization of loops
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
An efficient algorithm for the run-time parallelization of DOACROSS loops
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
An Empirical Study of Fortran Programs for Parallelizing Compilers
IEEE Transactions on Parallel and Distributed Systems
Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs
IEEE Transactions on Parallel and Distributed Systems
Journal of Computer Science and Technology
Time-Stamping Algorithms for Parallelization of Loops at Run-Time
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Principles of Speculative Run-Time Parallelization
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Improving Locality in the Parallelization of Doacross Loops (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Techniques for Reducing the Overhead of Run-Time Parallelization
CC '00 Proceedings of the 9th International Conference on Compiler Construction
An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
An Efficient Technique of Instruction Scheduling on a Superscalar-Based Mulprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
The Fortran parallel transformer and its programming environment
Information Sciences: an International Journal
Predecessor/successor approach for high-performance run-time wavefront scheduling
Information Sciences: an International Journal
Parallelization of utility programs based on behavior phase analysis
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Speculative parallelization: eliminating the overhead of failure
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.01 |
Enforcement of data dependence in parallel algorithms requires certain synchronization primitives. For simple data dependence, synchronization primitives like Full/Empty bit in HEP machine [5] can be very effective. However, if data dependence cannot be determined at compile time, or if very complicated, more efficient synchronization schemes and algorithms are needed.