Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
The Omega Library interface guide
The Omega Library interface guide
Advanced compiler design and implementation
Advanced compiler design and implementation
Parametric Analysis of Polyhedral Iteration Spaces
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Exploiting shared scratch pad memory space in embedded multiprocessor systems
Proceedings of the 39th annual Design Automation Conference
Energy-Aware Runtime Scheduling for Embedded-Multiprocessor SOCs
IEEE Design & Test
Locality Optimizations for Parallel Machines
CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
LODS: locality-oriented dynamic scheduling for on-chip multiprocessors
Proceedings of the 41st annual Design Automation Conference
An integrated hardware/software approach for run-time scratchpad management
Proceedings of the 41st annual Design Automation Conference
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Locality and Loop Scheduling on NUMA Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 02
Memory bank aware dynamic loop scheduling
Proceedings of the conference on Design, automation and test in Europe
Hi-index | 0.00 |
One of the major factors that can potentially slow down widespread use of embedded chip multiprocessors is lack of efficient software support. In particular, automated code parallelizers are badly needed since it is not realistic to expect an average programmer to parallelize a large complex embedded application over multiple processors, taking into account several factors at the same time such as code density, data locality, performance, power and code resilience. Especially, increasing use of software-managed SPM (scratch-pad memory) components in embedded systems require an SPM conscious code parallelization. Motivated by this observation, this paper proposes a novel compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. This strategy tries to achieve two objectives. First, the sets of loop iterations assigned to different processors should approximately take the same amount of time to finish. Second, the set of iterations assigned to a processor should exhibit high data reuse. Satisfying these two objectives help us to minimize parallel execution time of the application at hand. The specific method adopted by our scheduling strategy to achieve these objectives is to distribute loop iterations across parallel processors in an SPM conscious manner. In this strategy, the compiler analyzes the loop, identifies the potential SPM hits and misses, and distributes loop iterations over processors such that the processors have more or less the same execution time. Our experimental results so far indicate that the proposed approach generates much better results than existing loop schedulers. Specifically, it brings 18.9%, 22.4%, and 11.1% improvements in parallel execution time (with a chip multiprocessor of 8 cores) over a previously proposed static scheduler, a dynamic scheduler, and an alternate locality-conscious scheduler, respectively.