DDT: a research tool for automatic data distribution in high performance Fortran
Scientific Programming - Special issue: High Performance Fortran comes of age
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Powering networks on chips: energy-efficient and reliable interconnect design for SoCs
Proceedings of the 14th international symposium on Systems synthesis
IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Energy characterization of a tiled architecture processor with on-chip networks
Proceedings of the 2003 international symposium on Low power electronics and design
Energy optimization techniques in cluster interconnects
Proceedings of the 2003 international symposium on Low power electronics and design
Managing Power Consumption in Networks on Chip
Proceedings of the conference on Design, automation and test in Europe
Automatic computation and data decomposition for multiprocessors
Automatic computation and data decomposition for multiprocessors
Power-aware communication optimization for networks-on-chips with voltage scalable links
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Key research problems in NoC design: a holistic perspective
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Compiler-directed proactive power management for networks
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Software-directed power-aware interconnection networks
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Compiler-directed channel allocation for saving power in on-chip networks
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reducing NoC energy consumption through compiler-directed channel voltage scaling
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Energy- and performance-aware mapping for regular NoC architectures
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Dynamic thread and data mapping for NoC based CMPs
Proceedings of the 46th Annual Design Automation Conference
PM-COSYN: PE and memory co-synthesis for MPSoCs
Proceedings of the Conference on Design, Automation and Test in Europe
A3MAP: architecture-aware analytic mapping for networks-on-chip
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Characterizing the impact of process variation on 45 nm NoC-based CMPs
Journal of Parallel and Distributed Computing
Studying inter-core data reuse in multicores
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying inter-core data reuse in multicores
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Design and analysis of adaptive processor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A data layout optimization framework for NUCA-based multicores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Performance and power aware CMP thread allocation modeling
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Static task mapping for tiled chip multiprocessors with multiple voltage islands
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Neighborhood-aware data locality optimization for NoC-based multicores
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A3MAP: Architecture-aware analytic mapping for networks-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip
ACM Transactions on Embedded Computing Systems (TECS)
Cost-effective contention avoidance in a CMP with shared memory controllers
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Distributed memory interface synthesis for network-on-chips with 3D-stacked DRAMs
Proceedings of the International Conference on Computer-Aided Design
Shared memory aware MPSoC software deployment
Proceedings of the Conference on Design, Automation and Test in Europe
Mapping on multi/many-core systems: survey of current and emerging trends
Proceedings of the 50th Annual Design Automation Conference
UNISM: unified scheduling and mapping for general networks on chip
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Thermal-constrained task allocation for interconnect energy reduction in 3-D homogeneous MPSoCs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An energy-aware online task mapping algorithm in NoC-based system
The Journal of Supercomputing
Journal of Systems Architecture: the EUROMICRO Journal
Efficient programming paradigm for video streaming processing on TILE64 platform
The Journal of Supercomputing
Model-based cache-aware dispatching of object-oriented software for multicore systems
Journal of Systems and Software
Hi-index | 0.00 |
The problem attacked in this paper is one of automatically mapping an application onto a Network-on-Chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.