Compiling C for vectorization, parallelization, and inline expansion
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A practical data flow framework for array reference analysis and its use in optimizations
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Effectiveness of data dependence analysis
International Journal of Parallel Programming
Conflict modelling and instruction scheduling in code generation for in-house DSP cores
DAC '95 Proceedings of the 32nd annual ACM/IEEE Design Automation Conference
Address calculation for retargetable compilation and exploration of instruction-set architectures
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Code generation algorithms for digital signal processors
Code generation algorithms for digital signal processors
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
Integrating Loop and Data Transformations for Global Optimisation
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
Efficient implementation of DSP applications are critical for many embedded systems. Optimising compilers for application programs written in C, largely focus on code generation and scheduling which, with their growing maturity, are providing diminishing returns. This paper empirically evaluates another approach, namely high level source to source transformations. High level techniques were applied to the DSPstone benchmarks on 3 platforms: TriMedia TM-1000, Texas Instruments TMS320C6201 and the Analog SHARC ADSP-21160. On average, the best transformation gave a factor of 2.43 improvement across the platforms. In certain cases a speedup of 5.48 was found for the SHARC, 7.38 for the TM-1 and 2.3 for the C6201. These preliminary results justify further investigation into the use of high level techniques for embedded compilers.