Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
An Adaptive Approach to Data Placement
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
OCEANS: Optimizing Compilers for Embedded Applications
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A hierarchical locality algorithm for NUMA compilation
PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
Automatic computation and data decomposition for multiprocessors
Automatic computation and data decomposition for multiprocessors
Thermal-Aware IP Virtualization and Placement for Networks-on-Chip Architecture
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Design-Space Exploration of Power-Aware On/Off Interconnection Networks
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Exploring NoC Mapping Strategies: An Energy and Timing Aware Technique
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Multilevel expansion-based VLSI placement with blockages
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Energy- and performance-aware mapping for regular NoC architectures
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A data layout optimization framework for NUCA-based multicores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
As transistor sizes continue to shrink and the number of transistors per chip keeps increasing, chip multiprocessors (CMPs) are becoming a promising alternative to remain on the current performance trajectory for both high-end systems and embedded systems. Since future technologies offer the promise of being able to integrate billions of transistors on a chip, the prospects of having hundreds to thousands of processors on a single chip along with an underlying memory hierarchy and an interconnection system is entirely feasible. This paper proposes a compiler directed integrated code and data placement scheme for two-dimensional mesh based CMP architectures. The proposed approach uses a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data and then assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. During the mapping process, the on-chip memory capacity and load imbalance across different cores and the topology of the NoC are taken into account. In this paper, we present two variants of our approach: depth-first placement (DFP) and breadth-first placement (BFP), and compare them to three alternate code/data mapping schemes. The experimental evaluation shows that our CDAG based placement schemes are very successful in practice, achieving average performance improvements of 19.9% (DFP) and 16.8% (BFP), and average energy improvements of 29.7% (DFP) and 27.8% (BFP).