The data alignment phase in compiling programs for distributed-memory machines
Journal of Parallel and Distributed Computing
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Proceedings of the 42nd annual Design Automation Conference
Programming for parallelism and locality with hierarchically tiled arrays
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Integrated scratchpad memory optimization and task scheduling for MPSoC architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automated memory-aware application distribution for Multi-processor System-on-Chips
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A portable runtime interface for multi-level memory hierarchies
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Exploiting locality and parallelism with hierarchically tiled arrays
Exploiting locality and parallelism with hierarchically tiled arrays
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Hybrid access-specific software cache techniques for the cell BE architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
International Journal of Parallel Programming
CUDA-Lite: Reducing GPU Programming Complexity
Languages and Compilers for Parallel Computing
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Languages and Compilers for Parallel Computing
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Realizing FIFO Communication When Mapping Kahn Process Networks onto the Cell
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Heterogeneous multicore parallel programming for graphics processing units
Scientific Programming - Software Development for Multi-core Computing Systems
Task management in MPSoCs: an ASIP approach
Proceedings of the 2009 International Conference on Computer-Aided Design
Communication-aware task assignment algorithm for MPSoC using shared memory
Journal of Systems Architecture: the EUROMICRO Journal
Pipelined data parallel task mapping/scheduling technique for MPSoC
Proceedings of the Conference on Design, Automation and Test in Europe
On-chip communication architecture exploration for processor-pool-based MPSoC
Proceedings of the Conference on Design, Automation and Test in Europe
Automatic data distribution for improving data locality on the cell BE architecture
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Advances in semiconductor technique enable multiple processor cores to be integrated into a single chip. Heterogeneous multiprocessor system-on-a-chip (MPSoC) becomes important platforms to accelerate applications. However, compilation techniques for memory management on MPSoCs still lag behind. This paper presents an automatic memory management framework to orchestrate the data movement between local memory and off-chip memory. In our framework, data alignment, hierarchically data distribution, communication generation, loop tiling, and loop splitting are employed. Moreover, a communication optimization approach is proposed to improve data reuse. These techniques can reduce off-chip memory access and exploit data locality. Experimental results on Cell BE show that our data management framework can generate efficient code for the program.