Integrated code and data placement in two-dimensional mesh based chip multiprocessors

Authors:
Taylan Yemliha;Shekhar Srikantaiah;Mahmut Kandemir;Mustafa Karakoy;Mary Jane Irwin
Affiliations:
Syracuse University, Syracuse, NY;Pennsylvania State University, University Park, PA;Pennsylvania State University, University Park, PA;Imperial College, London;Pennsylvania State University, University Park, PA
Venue:
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Year:
2008

Citing 16
Cited 1

Access normalization: loop restructuring for NUMA compilers

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
Simics: A Full System Simulation Platform

Computer
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
An Adaptive Approach to Data Placement

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
OCEANS: Optimizing Compilers for Embedded Applications

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A hierarchical locality algorithm for NUMA compilation

PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
Automatic computation and data decomposition for multiprocessors

Automatic computation and data decomposition for multiprocessors
Thermal-Aware IP Virtualization and Placement for Networks-on-Chip Architecture

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Design-Space Exploration of Power-Aware On/Off Interconnection Networks

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Exploring NoC Mapping Strategies: An Energy and Timing Aware Technique

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Multilevel expansion-based VLSI placement with blockages

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Energy- and performance-aware mapping for regular NoC architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

A data layout optimization framework for NUCA-based multicores

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

As transistor sizes continue to shrink and the number of transistors per chip keeps increasing, chip multiprocessors (CMPs) are becoming a promising alternative to remain on the current performance trajectory for both high-end systems and embedded systems. Since future technologies offer the promise of being able to integrate billions of transistors on a chip, the prospects of having hundreds to thousands of processors on a single chip along with an underlying memory hierarchy and an interconnection system is entirely feasible. This paper proposes a compiler directed integrated code and data placement scheme for two-dimensional mesh based CMP architectures. The proposed approach uses a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data and then assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. During the mapping process, the on-chip memory capacity and load imbalance across different cores and the topology of the NoC are taken into account. In this paper, we present two variants of our approach: depth-first placement (DFP) and breadth-first placement (BFP), and compare them to three alternate code/data mapping schemes. The experimental evaluation shows that our CDAG based placement schemes are very successful in practice, achieving average performance improvements of 19.9% (DFP) and 16.8% (BFP), and average energy improvements of 29.7% (DFP) and 27.8% (BFP).