Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Memory exploration for low power, embedded systems
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
An optimal memory allocation for application-specific multiprocessor system-on-chip
Proceedings of the 14th international symposium on Systems synthesis
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
An Overview of a Compiler for Scalable Parallel Machines
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
VLSI-SOC '01 Proceedings of the IFIP TC10/WG10.5 Eleventh International Conference on Very Large Scale Integration of Systems-on/Chip: SOC Design Methodologies
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Integrating Loop and Data Transformations for Global Optimisation
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Automatic Partitioning of Parallel Loops for Cache-Coherent Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Hi-index | 0.00 |
While there have been considerable work in the last couple of years for architecting embedded chip multiprocessors, programming and compiler support required for them took relatively less attention. Our goal in this paper is to show that conventional compiler-directed code parallelization used in high performance computing is not very suitable for embedded chip multiprocessors where minimizing memory space requirements is an important issue. We propose and evaluate a novel memory-conscious loop parallelization strategy with the objective of minimizing the data memory requirements of processors. The proposed approach, which is formulated as a branch-and-bound problem, accomplishes its objective by being careful in selecting the loops to parallelize in a given loop nest.