A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems

Authors:
Bjorn Franke;Michael F. P. O'Boyle
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2005

Citing 22
Cited 4

Data optimization: allocation of arrays to reduce communication on SIMD machines

Journal of Parallel and Distributed Computing - Massively parallel computation
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Compiling for shared-memory and message-passing computers

ACM Letters on Programming Languages and Systems (LOPLAS)
Counting solutions to Presburger formulas: how and why

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler reduction of synchronisation in shared virtual memory systems

ICS '95 Proceedings of the 9th international conference on Supercomputing
Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Algorithms for automatic alignment of arrays

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Improving Cache Locality by a Combination of Loop and Data Transformations

IEEE Transactions on Computers - Special issue on cache memory and related problems
Software environment for a multiprocessor DSP

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Compilation techniques for parallel systems

Parallel Computing - Special Anniversary issue
A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Compile Time Barrier Synchronization Minimization

IEEE Transactions on Parallel and Distributed Systems
Integrating loop and data transformations for global optimization

Journal of Parallel and Distributed Computing
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Array recovery and high-level transformations for DSP applications

ACM Transactions on Embedded Computing Systems (TECS)
Solving Alignment Using Elementary Linear Algebra

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Automatic Data Layout Using 0-1 Integer Programming

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Exploiting Fine- and Coarse-grain Parallelism in Embedded Programs

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Parallelization of Benchmarks for Scalable Shared-Memory Multiprocessors

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques

How to Write Fast Numerical Code: A Small Introduction

Generative and Transformational Techniques in Software Engineering II
Exploring parallelizations of applications for MPSoC platforms using MPA

Proceedings of the Conference on Design, Automation and Test in Europe
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction
Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Auto-parallelizing compilers for embedded applications have been unsuccessful due to the widespread use of pointer arithmetic and the complex memory model of multiple-address space digital signal processors (DSPs). This paper develops, for the first time, a complete auto-parallelization approach, which overcomes these issues. It first combines a pointer conversion technique with a new modulo elimination transformation for program recovery enabling later parallelization stages. Next, it integrates a novel data transformation technique that exposes the processor location of partitioned data. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. Furthermore, as DSPs do not possess any data cache structure, an optimization is presented which transforms the program to both exploit remote data locality and local memory bandwidth. This parallelization approach is applied to the DSPstone and UTDSP benchmark suites, giving an average speedup of 3.78 on four Analog Devices TigerSHARC TS-101 processors.