A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing

Authors:
Youfeng Wu;Shiliang Hu;Edson Borin;Cheng Wang
Affiliations:
Programming Systems Lab, Intel Labs 2200 Mission College Blvd, Santa Clara, CA 95052;Programming Systems Lab, Intel Labs 2200 Mission College Blvd, Santa Clara, CA 95052;Institute of Computing - University of Campinas, Av. Albert Einstein, 1251 - Campinas/Brazil;Programming Systems Lab, Intel Labs 2200 Mission College Blvd, Santa Clara, CA 95052
Venue:
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2011

Citing 25
Cited 6

DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Itanium 2 Processor Microarchitecture

IEEE Micro
Improving Branch Prediction Accuracy in Embedded Processors in the Presence of Context Switches

ICCD '03 Proceedings of the 21st International Conference on Computer Design
Comparing Program Phase Detection Techniques

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Control Transfers in Code Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Achieving High Performance via Co-Designed Virtual Machines

IWIA '98 Proceedings of the 1998 International Workshop on Innovative Architecture
Conjoined-Core Chip Multiprocessing

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Stream Programming on General-Purpose Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Reducing Startup Time in Co-Designed Virtual Machines

Proceedings of the 33rd annual international symposium on Computer Architecture
Core architecture optimization for heterogeneous chip multiprocessors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Improving performance and reducing energy-delay with adaptive resource resizing for out-of-order embedded processors

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Efficient Embedded Computing

Computer
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Dynamic parallelization of single-threaded binary programs using speculative slicing

Proceedings of the 23rd international conference on Supercomputing
Rigel: an architecture and scalable programming interface for a 1000-core accelerator

Proceedings of the 36th annual international symposium on Computer architecture
A real system evaluation of hardware atomicity for software speculation

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
TAO: two-level atomicity for dynamic binary optimizations

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Software data spreading: leveraging distributed caches to improve single thread performance

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis

Proceedings of the 37th annual international symposium on Computer architecture
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture
PARROT: power awareness through selective dynamically optimized traces

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems

Formal virtualization requirements for the ARM architecture

Journal of Systems Architecture: the EUROMICRO Journal
Systematic evaluation of workload clustering for extremely energy-efficient architectures

ACM SIGARCH Computer Architecture News
Low-latency adaptive mode transitions and hierarchical power management in asymmetric clustered cores

ACM Transactions on Architecture and Code Optimization (TACO)
Speculative hardware/software co-designed floating-point multiply-add fusion

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions

Proceedings of the Eleventh ACM International Conference on Embedded Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is increasingly challenging to improve single thread performance because power/energy consumption becomes a major barrier to achieve significantly higher performance for general purpose cores. General purpose processors are designed to perform well in a wide variety of market segments, at the cost of having significantly lower performance-per-watt than special purpose processors targeting limited applications or market segments. In this paper, we propose a HW/SW co-designed heterogeneous multi-core virtual machine, called TwinPeaks, which integrates a set of less general but power efficient cores and uses dynamic binary optimization to schedule code regions to run on the most efficient cores. Our experiment and analysis indicate that TwinPeaks with a wide in-order core and a narrow out-of-order core may achieve 108% performance at ˜71% energy of a big 4-wide out-of-order core.