The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Modernizing Legacy Systems: Software Technologies, Engineering Process and Business Practices
Modernizing Legacy Systems: Software Technologies, Engineering Process and Business Practices
Profiling tools for hardware/software partitioning of embedded applications
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Dynamic Feature Traces: Finding Features in Unfamiliar Code
ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Tracking Objects to Detect Feature Dependencies
ICPC '07 Proceedings of the 15th IEEE International Conference on Program Comprehension
CIGAR: Application Partitioning for a CPU/Coprocessor Architecture
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Massive parallel LDPC decoding on GPU
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Experiences with parallelizing a bio-informatics program on the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system
Proceedings of the 8th ACM International Conference on Computing Frontiers
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
There is a trend towards using accelerators to increase performance and energy efficiency of general-purpose processors. Adoption of accelerators, however, depends on the availability of tools to facilitate programming these devices. In this paper, we present techniques for automatically partitioning programs for execution on accelerators. We call the off-loaded code regions sub-algorithms, which are parts of the program that are loosely connected to the remainder of the program. We present three heuristics for automatically identifying sub-algorithms based on control flow and data flow properties. Analysis of SPECint and MiBench benchmarks shows that on average 12 sub-algorithms are identified (up to 54), covering the full execution time for 27 out of 30 benchmarks. We show that these sub-algorithms are suitable for off-loading them to accelerators by manually implementing sub-algorithms for 2 SPECint benchmarks on the Cell processor.