Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generation

Authors:
Patrick A. La Fratta;Peter M. Kogge
Affiliations:
-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 42
Cited 0

Global register allocation for minimizing energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Exploring the Design Space of Future CMPs

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Cool-Cache: A compiler-enabled energy efficient data caching framework for embedded/multimedia processors

ACM Transactions on Embedded Computing Systems (TECS)
Access Pattern Restructuring for Memory Energy

IEEE Transactions on Parallel and Distributed Systems
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Of Piglets and Threadlets: Architectures for Self-Contained, Mobile, Memory Programming

IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
Energy-aware variable partitioning and instruction scheduling for multibank memory architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
The Impact of Performance Asymmetry in Emerging Multicore Architectures

Proceedings of the 32nd annual international symposium on Computer Architecture
The implications of working set analysis on supercomputing memory hierarchy design

Proceedings of the 19th annual international conference on Supercomputing
Energy-Efficient Thread-Level Speculation

IEEE Micro
Core architecture optimization for heterogeneous chip multiprocessors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Thread-associative memory for multicore and multithreaded computing

Proceedings of the 2006 international symposium on Low power electronics and design
Programming future architectures: dusty decks, memory walls, and the speed of light

Programming future architectures: dusty decks, memory walls, and the speed of light
Introduction to the cell broadband engine architecture

IBM Journal of Research and Development
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Amdahl's Law in the Multicore Era

Computer
Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era

Computer
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

ACM SIGARCH Computer Architecture News
Compiler-directed scratchpad memory management via graph coloring

ACM Transactions on Architecture and Code Optimization (TACO)
The Green500 List: Year one

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Dynamic thread and data mapping for NoC based CMPs

Proceedings of the 46th Annual Design Automation Conference
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal
Toward energy-efficient computing

Communications of the ACM
The BubbleWrap many-core: popping cores for sequential acceleration

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An Efficient Low-Complexity Alternative to the ROB for Out-of-Order Retirement of Instructions

DSD '09 Proceedings of the 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools
On the Interplay of Parallelization, Program Performance, and Energy Consumption

IEEE Transactions on Parallel and Distributed Systems
A hardware/software framework for instruction and data scratchpad memory allocation

ACM Transactions on Architecture and Code Optimization (TACO)
Collaborative scheduling of DAG structured computations on multicore processors

Proceedings of the 7th ACM international conference on Computing frontiers
Models for generating locality-tuned traveling threads for a hierarchical multi-level heterogeneous multicore

Proceedings of the 7th ACM international conference on Computing frontiers
The GPU Computing Era

IEEE Micro
Dynamic workload characterization for power efficient scheduling on CMP systems

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Energy Minimization on Thread-Level Speculation in Multicore Systems

ISPDC '10 Proceedings of the 2010 Ninth International Symposium on Parallel and Distributed Computing
Energy efficient speculative threads: dynamic thread allocation in Same-ISA heterogeneous multicore systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An analysis of the energy efficiency of multi-threading on multi-core machines

GREENCOMP '10 Proceedings of the International Conference on Green Computing
The zEnterprise 196 System and Microprocessor

IEEE Micro
Dynamically Specialized Datapaths for energy efficient computing

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Optimizing the internal microarchitecture and isa of a traveling thread pim system

Optimizing the internal microarchitecture and isa of a traveling thread pim system
Dynamically managed data for CPU-GPU architectures

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor

IEEE Transactions on Parallel and Distributed Systems
[2010] Facing the Exascale Energy Wall

IWIA '10 Proceedings of the 2010 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Energy costs have become increasingly problematic for high performance processors, but the rising number of cores on-chip offers promising opportunities for energy reduction. Further, emerging architectures such as heterogeneous multicores present new opportunities for improved energy efficiency. While previous work has presented novel memory architectures, multithreading techniques, and data mapping strategies for reducing energy, consideration to thread generation mechanisms that take into account data locality for this purpose has been limited. This study presents methodologies for the joint partitioning of data and threads to parallelize sequential codes across an innovative heterogeneous multicore processor called the Passive/Active Multicore (PAM) for reducing energy consumption from on-chip data transport and cache access components while also improving execution time. Experimental results show that the design with automatic thread partitioning offered reductions in energy-delay product (EDP) of up to 48%.