DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Itanium 2 Processor Microarchitecture
IEEE Micro
Improving Branch Prediction Accuracy in Embedded Processors in the Presence of Context Switches
ICCD '03 Proceedings of the 21st International Conference on Computer Design
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Control Transfers in Code Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Achieving High Performance via Co-Designed Virtual Machines
IWIA '98 Proceedings of the 1998 International Workshop on Innovative Architecture
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Reducing Startup Time in Co-Designed Virtual Machines
Proceedings of the 33rd annual international symposium on Computer Architecture
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Computer
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Dynamic parallelization of single-threaded binary programs using speculative slicing
Proceedings of the 23rd international conference on Supercomputing
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
A real system evaluation of hardware atomicity for software speculation
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
TAO: two-level atomicity for dynamic binary optimizations
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Software data spreading: leveraging distributed caches to improve single thread performance
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis
Proceedings of the 37th annual international symposium on Computer architecture
Data marshaling for multi-core architectures
Proceedings of the 37th annual international symposium on Computer architecture
PARROT: power awareness through selective dynamically optimized traces
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Formal virtualization requirements for the ARM architecture
Journal of Systems Architecture: the EUROMICRO Journal
Systematic evaluation of workload clustering for extremely energy-efficient architectures
ACM SIGARCH Computer Architecture News
ACM Transactions on Architecture and Code Optimization (TACO)
Speculative hardware/software co-designed floating-point multiply-add fusion
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions
Proceedings of the Eleventh ACM International Conference on Embedded Software
Hi-index | 0.00 |
It is increasingly challenging to improve single thread performance because power/energy consumption becomes a major barrier to achieve significantly higher performance for general purpose cores. General purpose processors are designed to perform well in a wide variety of market segments, at the cost of having significantly lower performance-per-watt than special purpose processors targeting limited applications or market segments. In this paper, we propose a HW/SW co-designed heterogeneous multi-core virtual machine, called TwinPeaks, which integrates a set of less general but power efficient cores and uses dynamic binary optimization to schedule code regions to run on the most efficient cores. Our experiment and analysis indicate that TwinPeaks with a wide in-order core and a narrow out-of-order core may achieve 108% performance at ˜71% energy of a big 4-wide out-of-order core.