Application mapping for chip multiprocessors

Authors:
Guangyu Chen;Feihui Li;S. W. Son;M. Kandemir
Affiliations:
Microsoft;NVIDIA;Penn State University;Penn State University
Venue:
Proceedings of the 45th annual Design Automation Conference
Year:
2008

Citing 22
Cited 23

DDT: a research tool for automatic data distribution in high performance Fortran

Scientific Programming - Special issue: High Performance Fortran comes of age
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
Powering networks on chips: energy-efficient and reliable interconnect design for SoCs

Proceedings of the 14th international symposium on Systems synthesis
A Survey of Wormhole Routing Techniques in Direct Networks

Computer
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Energy characterization of a tiled architecture processor with on-chip networks

Proceedings of the 2003 international symposium on Low power electronics and design
Energy optimization techniques in cluster interconnects

Proceedings of the 2003 international symposium on Low power electronics and design
Managing Power Consumption in Networks on Chip

Proceedings of the conference on Design, automation and test in Europe
Automatic computation and data decomposition for multiprocessors

Automatic computation and data decomposition for multiprocessors
Power-aware communication optimization for networks-on-chips with voltage scalable links

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Key research problems in NoC design: a holistic perspective

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Compiler-directed proactive power management for networks

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Software-directed power-aware interconnection networks

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Compiler-directed channel allocation for saving power in on-chip networks

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reducing NoC energy consumption through compiler-directed channel voltage scaling

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Energy- and performance-aware mapping for regular NoC architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Dynamic thread and data mapping for NoC based CMPs

Proceedings of the 46th Annual Design Automation Conference
PM-COSYN: PE and memory co-synthesis for MPSoCs

Proceedings of the Conference on Design, Automation and Test in Europe
A3MAP: architecture-aware analytic mapping for networks-on-chip

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Characterizing the impact of process variation on 45 nm NoC-based CMPs

Journal of Parallel and Distributed Computing
Studying inter-core data reuse in multicores

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying inter-core data reuse in multicores

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Design and analysis of adaptive processor

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A data layout optimization framework for NUCA-based multicores

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Performance and power aware CMP thread allocation modeling

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Static task mapping for tiled chip multiprocessors with multiple voltage islands

ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Neighborhood-aware data locality optimization for NoC-based multicores

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A3MAP: Architecture-aware analytic mapping for networks-on-chip

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip

ACM Transactions on Embedded Computing Systems (TECS)
Cost-effective contention avoidance in a CMP with shared memory controllers

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Distributed memory interface synthesis for network-on-chips with 3D-stacked DRAMs

Proceedings of the International Conference on Computer-Aided Design
Shared memory aware MPSoC software deployment

Proceedings of the Conference on Design, Automation and Test in Europe
Mapping on multi/many-core systems: survey of current and emerging trends

Proceedings of the 50th Annual Design Automation Conference
UNISM: unified scheduling and mapping for general networks on chip

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Thermal-constrained task allocation for interconnect energy reduction in 3-D homogeneous MPSoCs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An energy-aware online task mapping algorithm in NoC-based system

The Journal of Supercomputing
From UML specifications to mapping and scheduling of tasks into a NoC, with reliability considerations

Journal of Systems Architecture: the EUROMICRO Journal
Efficient programming paradigm for video streaming processing on TILE64 platform

The Journal of Supercomputing
Model-based cache-aware dispatching of object-oriented software for multicore systems

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem attacked in this paper is one of automatically mapping an application onto a Network-on-Chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.