Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
JouleTrack: a web based tool for software energy profiling
Proceedings of the 38th annual Design Automation Conference
SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
New Generation of Predictive Technology Model for Sub-45nm Design Exploration
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors
IEEE Computer Architecture Letters
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Impact of process variations on multicore performance symmetry
Proceedings of the conference on Design, automation and test in Europe
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Amdahl's Law in the Multicore Era
Computer
Accomodating Diversity in CMPs with Heterogeneous Frequencies
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The BubbleWrap many-core: popping cores for sequential acceleration
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Improving energy efficiency of multi-threaded applications using heterogeneous CMOS-TFET multicores
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
An energy-efficient heterogeneous CMP based on hybrid TFET-CMOS cores
Proceedings of the 48th Design Automation Conference
Variation-tolerant ultra low-power heterojunction tunnel FET SRAM design
NANOARCH '11 Proceedings of the 2011 IEEE/ACM International Symposium on Nanoscale Architectures
Lighting the dark silicon by exploiting heterogeneity on future processors
Proceedings of the 50th Annual Design Automation Conference
Run-time adaption for highly-complex multi-core systems
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
Device level heterogeneity promises high energy efficiency over a larger range of voltages than a single device technology alone can provide. In this paper, starting from device models, we first present ground-up modeling of CMOS and TFET cores, and verify this model against existing processors. Using our core models, we construct a 32-core TFET-CMOS heterogeneous multicore. We then show that it is a very challenging task to identify the ideal runtime configuration to use in such a heterogeneous multicore, which includes finding the best number/type of cores to activate and the corresponding voltages/frequencies to select for these cores. In order to effectively utilize this heterogeneous processor, we propose a novel automated runtime scheme. Our scheme is designed to automatically improve the performance of applications running on heterogeneous CMOS-TFET multicores operating under a fixed power budget, without requiring any effort from the application programmer or the user. Our scheme combines heterogeneous thread-to-core mapping, dynamic work partitioning, and dynamic power partitioning to identify energy efficient operating points. With simulations we show that our runtime scheme can enable a CMOS-TFET multicore to serve a diversity of workloads with high energy efficiency and achieve 21% average speedup over the best performing equivalent homogeneous multicore.