Global register allocation for minimizing energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Exploring the Design Space of Future CMPs
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
ACM Transactions on Embedded Computing Systems (TECS)
Access Pattern Restructuring for Memory Energy
IEEE Transactions on Parallel and Distributed Systems
Dynamic overlay of scratchpad memory for energy minimization
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Of Piglets and Threadlets: Architectures for Self-Contained, Mobile, Memory Programming
IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
Energy-aware variable partitioning and instruction scheduling for multibank memory architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
The implications of working set analysis on supercomputing memory hierarchy design
Proceedings of the 19th annual international conference on Supercomputing
Energy-Efficient Thread-Level Speculation
IEEE Micro
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Thread-associative memory for multicore and multithreaded computing
Proceedings of the 2006 international symposium on Low power electronics and design
Programming future architectures: dusty decks, memory walls, and the speed of light
Programming future architectures: dusty decks, memory walls, and the speed of light
Introduction to the cell broadband engine architecture
IBM Journal of Research and Development
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Amdahl's Law in the Multicore Era
Computer
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design
ACM SIGARCH Computer Architecture News
Compiler-directed scratchpad memory management via graph coloring
ACM Transactions on Architecture and Code Optimization (TACO)
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Dynamic thread and data mapping for NoC based CMPs
Proceedings of the 46th Annual Design Automation Conference
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Toward energy-efficient computing
Communications of the ACM
The BubbleWrap many-core: popping cores for sequential acceleration
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An Efficient Low-Complexity Alternative to the ROB for Out-of-Order Retirement of Instructions
DSD '09 Proceedings of the 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools
On the Interplay of Parallelization, Program Performance, and Energy Consumption
IEEE Transactions on Parallel and Distributed Systems
A hardware/software framework for instruction and data scratchpad memory allocation
ACM Transactions on Architecture and Code Optimization (TACO)
Collaborative scheduling of DAG structured computations on multicore processors
Proceedings of the 7th ACM international conference on Computing frontiers
Proceedings of the 7th ACM international conference on Computing frontiers
IEEE Micro
Dynamic workload characterization for power efficient scheduling on CMP systems
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Energy Minimization on Thread-Level Speculation in Multicore Systems
ISPDC '10 Proceedings of the 2010 Ninth International Symposium on Parallel and Distributed Computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An analysis of the energy efficiency of multi-threading on multi-core machines
GREENCOMP '10 Proceedings of the International Conference on Green Computing
Dynamically Specialized Datapaths for energy efficient computing
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Optimizing the internal microarchitecture and isa of a traveling thread pim system
Optimizing the internal microarchitecture and isa of a traveling thread pim system
Dynamically managed data for CPU-GPU architectures
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor
IEEE Transactions on Parallel and Distributed Systems
[2010] Facing the Exascale Energy Wall
IWIA '10 Proceedings of the 2010 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems
Hi-index | 0.00 |
Energy costs have become increasingly problematic for high performance processors, but the rising number of cores on-chip offers promising opportunities for energy reduction. Further, emerging architectures such as heterogeneous multicores present new opportunities for improved energy efficiency. While previous work has presented novel memory architectures, multithreading techniques, and data mapping strategies for reducing energy, consideration to thread generation mechanisms that take into account data locality for this purpose has been limited. This study presents methodologies for the joint partitioning of data and threads to parallelize sequential codes across an innovative heterogeneous multicore processor called the Passive/Active Multicore (PAM) for reducing energy consumption from on-chip data transport and cache access components while also improving execution time. Experimental results show that the design with automatic thread partitioning offered reductions in energy-delay product (EDP) of up to 48%.