Compiler parallelization of C programs for multi-core DSPs with multiple address spaces

Authors:
Björn Franke;M.F.P. O'Boyle
Affiliations:
University of Edinburgh;University of Edinburgh
Venue:
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Year:
2003

Citing 5
Cited 4

Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unified compilation techniques for shared and distributed address space machines

ICS '95 Proceedings of the 9th international conference on Supercomputing
Compile Time Barrier Synchronization Minimization

IEEE Transactions on Parallel and Distributed Systems
Integrating loop and data transformations for global optimization

Journal of Parallel and Distributed Computing
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer

Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Multi-objective aware extraction of task-level parallelism using genetic algorithms

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops a new approach to compiling C programs for multiple address space, multi-processor DSPs. It integrates a novel data transformation technique that exposes the processor location of partitioned data into a parallelization strategy. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. This approach is applied to the UTDSP benchmark suite and evaluated on a four processor TigerSHARC board, where it is shown to outperform existing approaches and gives an average speedup of 3.25 on the parallel benchmarks.