A memory-conscious code parallelization scheme

Authors:
Liping Xue;Ozcan Ozturk;Mahmut Kandemir
Affiliations:
University of Delaware, Newark, DE;The Pennsylvania State University, University Park, PA;The Pennsylvania State University, University Park, PA
Venue:
Proceedings of the 44th annual Design Automation Conference
Year:
2007

Citing 14
Cited 0

To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Memory exploration for low power, embedded systems

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
An optimal memory allocation for application-specific multiprocessor system-on-chip

Proceedings of the 14th international symposium on Systems synthesis
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
An Overview of a Compiler for Scalable Parallel Machines

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Automatic Code-Transformation and Architecture Refinement for Application-Specific Multiprocessor SoCs with Shared Memory

VLSI-SOC '01 Proceedings of the IFIP TC10/WG10.5 Eleventh International Conference on Very Large Scale Integration of Systems-on/Chip: SOC Design Methodologies
Access ordering and memory-conscious cache utilization

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Integrating Loop and Data Transformations for Global Optimisation

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The energy efficiency of CMP vs. SMT for multimedia workloads

Proceedings of the 18th annual international conference on Supercomputing
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Automatic Partitioning of Parallel Loops for Cache-Coherent Multiprocessors

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

While there have been considerable work in the last couple of years for architecting embedded chip multiprocessors, programming and compiler support required for them took relatively less attention. Our goal in this paper is to show that conventional compiler-directed code parallelization used in high performance computing is not very suitable for embedded chip multiprocessors where minimizing memory space requirements is an important issue. We propose and evaluate a novel memory-conscious loop parallelization strategy with the objective of minimizing the data memory requirements of processors. The proposed approach, which is formulated as a branch-and-bound problem, accomplishes its objective by being careful in selecting the loops to parallelize in a given loop nest.