Multiprocessor cache synchronization: issues, innovations, evolution
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
IEEE Transactions on Software Engineering
Effect of storage allocation/reclamation methods on parallelism and storage requirements
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
International Journal of Parallel Programming
Performance analysis of the FFT algorithm on a shared-memory parallel architecture
IBM Journal of Research and Development
Compiler algorithms for synchronization
IEEE Transactions on Computers
Two algorithms for barrier synchronization
International Journal of Parallel Programming
Impact of self-scheduling order on performance on multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Guide to parallel programming on Sequent computer systems: 2nd edition
Guide to parallel programming on Sequent computer systems: 2nd edition
An overview of the PTRAN analysis system for multiprocessing
Proceedings of the 1st International Conference on Supercomputing
Self-scheduling, data synchronization and program transformation for multiprocessor systems
Self-scheduling, data synchronization and program transformation for multiprocessor systems
Monitors: an operating system structuring concept
Communications of the ACM
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Speedup of ordinary programs
Efficient Doacross execution on distributed shared-memory multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler algorithms for event variable synchronization
ICS '91 Proceedings of the 5th international conference on Supercomputing
On Effective Execution of Nonuniform DOACROSS Loops
IEEE Transactions on Parallel and Distributed Systems
Compiler techniques for data synchronization in nested parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Dependence Uniformization: A Loop Parallelization Technique
IEEE Transactions on Parallel and Distributed Systems
An Efficient Technique of Instruction Scheduling on a Superscalar-Based Mulprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A practical approach to DOACROSS parallelization
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.01 |
As the grain size becomes smaller, more parallelism can be found in most programs. However, to exploit smaller grain parallelism, more efficient synchronization primitives are needed to reduce the increased synchronization overhead. The granularity of parallelism that can be exploited on a multiprocessor system depends heavily on the type and the efficiency of the synchronization supported by the system. For medium-grain parallelism, ordered dependencies such as data dependencies and control dependencies need to be enforced in order to guarantee the correctness of the parallel execution. Hence, data synchronization is one of the major sources of synchronization overhead in the program execution.In this paper, we classify the synchronization schemes based on how synchronization variables are used. A new scheme, the process-oriented scheme, is proposed. This scheme requires a very small number of synchronization variables and can be supported very efficiently by simple hardware in the system.