A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
IEEE Transactions on Software Engineering
Effect of storage allocation/reclamation methods on parallelism and storage requirements
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Automatic decomposition of scientific programs for parallel execution
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The horizon supercomputing system: architecture and software
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Assessing the benefits of fine-grain parallelism in dataflow programs
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
On data synchronization for multiprocessors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Static synchronization beyond VLIW
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Using an oracle to measure potential parallelism in single instruction stream programs
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Compiler optimizations and architecture design issues for multiprocessors (parallel)
Compiler optimizations and architecture design issues for multiprocessors (parallel)
Compiler algorithms for event variable synchronization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Analysis and transformation in the ParaScope editor
ICS '91 Proceedings of the 5th international conference on Supercomputing
Execution-driven tools for parallel simulation of parallel architectures and applications
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A distributed memory LAPSE: parallel simulation of message-passing programs
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
An approach to scalability study of shared memory parallel systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Timing simulation of paragon codes using workstation clusters
WSC '94 Proceedings of the 26th conference on Winter simulation
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Parallelized Direct Execution Simulation of Message-Passing Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor
Proceedings of the 25th annual international symposium on Computer architecture
25 years of the international symposia on Computer architecture (selected papers)
Distributed data flow computing system
ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors
IEEE Transactions on Computers
Compiler-directed run-time monitoring of program data access
Proceedings of the 2002 workshop on Memory system performance
Automatic run-time extraction of communication graphs from multithreaded applications
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Lightweight lock-free synchronization methods for multithreading
Proceedings of the 20th annual international conference on Supercomputing
HPP controller: a system controller for high performance computing
Frontiers of Computer Science in China
Adaptive parallel approximate similarity search for responsive multimedia retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Runtime adjustment of parallel nested loops
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
DVM: towards a datacenter-scale virtual machine
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Support for fine-grained synchronization in shared-memory multiprocessors
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Hi-index | 0.00 |
In this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot of parallelism at the fine grain level, synchronization and scheduling strategies determine the ultimate performance of the system. Loop-iteration level parallelism seems to be a more appropriate level when those factors are considered. We also study barrier synchronization and data synchronization at the loop iteration level and found both schemes are needed for a better performance.