The systematic design of systolic arrays
Centre National de Recherche Scientifique on Automata networks in computer science: theory and applications
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Scalar replacement in the presence of conditional control flow
Software—Practice & Experience
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation
Proceedings of the 28th annual international symposium on Microarchitecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Automatic storage management for parallel programs
Parallel Computing - Special issues on languages and compilers for parallel computers
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
Proceedings of the 14th international conference on Supercomputing
Loop tiling for parallelism
The parallel execution of DO loops
Communications of the ACM
Optimizing memory usage in the polyhedral model
ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop fusion for memory space optimization
Proceedings of the 14th international symposium on Systems synthesis
Loop fusion for clustered VLIW architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Scheduling and Automatic Parallelization
Scheduling and Automatic Parallelization
Optimizing inter-nest data locality
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Loop Parallelization in the Polytope Model
CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
New Results on Array Contraction
ASAP '02 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Improving Software Pipelining With Unroll-and-Jam
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
On the Complexity of Loop Fusion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Lattice-based memory allocation
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing the memory bandwidth with loop fusion
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The Energy Impact of Aggressive Loop Fusion
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Proceedings of the 42nd annual Design Automation Conference
Data space-oriented tiling for enhancing locality
ACM Transactions on Embedded Computing Systems (TECS)
A polynomial-time algorithm for memory space reduction
International Journal of Parallel Programming
Buffer and register allocation for memory space optimization
ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications
Journal of Signal Processing Systems
Compiler-directed memory management for heterogeneous MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
MpAssign: A Framework for Solving the Many-Core Platform Mapping Problem
Software—Practice & Experience
Proceedings of the 5th IBM Collaborative Academia Research Exchange Workshop
Hi-index | 0.00 |
Multiprocessor system-on-a-chip (MPSoC) architectures have received a lot of attention in the past years, but few advances in compilation techniques target these architectures. This is particularly true for the exploitation of data locality. Most of the compilation techniques for parallel architectures discussed in the literature are based on a single loop nest. This article presents new techniques that consist in applying loop fusion and tiling to several loop nests and to parallelize the resulting code across different processors. These two techniques reduce the number of memory accesses. However, they increase dependencies and thereby reduce the exploitable parallelism in the code. This article tries to address this contradiction. To optimize the memory space used by temporary arrays, smaller buffers are used as a replacement. Different strategies are studied to optimize the processing time spent accessing these buffers. The experiments show that these techniques yield a significant reduction in the number of data cache misses (30%) and in processing time (50%).