Energy efficient speculative threads: dynamic thread allocation in Same-ISA heterogeneous multicore systems

Authors:
Yangchun Luo;Venkatesan Packirisamy;Wei-Chung Hsu;Antonia Zhai
Affiliations:
University of Minnesota Twin Cities, Minneapolis, MN, USA;NVIDIA Corporation, Santa Clara, CA, USA;National Chiao Tung University, Hsinchu, Taiwan Roc;University of Minnesota Twin Cities, Minneapolis, MN, USA
Venue:
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Year:
2010

Citing 45
Cited 5

The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture

IEEE Transactions on Computers
An architecture for mostly functional languages

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Min-cut program decomposition for thread-level speculation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Voltage and Frequency Control With Adaptive Reaction Time in Multiple-Clock-Domain Processors

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Thread-Level Speculation on a CMP can be energy efficient

Proceedings of the 19th annual international conference on Supercomputing
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
The Danger of Interval-Based Power Efficiency Metrics: When Worst Is Best

IEEE Computer Architecture Letters
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic thread assignment on heterogeneous multiprocessor architectures

Proceedings of the 3rd conference on Computing frontiers
Core architecture optimization for heterogeneous chip multiprocessors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Hardware support for spin management in overcommitted virtual machines

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Speculative thread decomposition through empirical optimization

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Per-thread cycle accounting in SMT processors

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Dynamic performance tuning for speculative threads

Proceedings of the 36th annual international symposium on Computer architecture
Core-Selectability in Chip Multiprocessors

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Detecting phases in parallel applications on shared memory architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Supporting speculative multithreading on simultaneous multithreaded processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Loop selection for thread-level speculation

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing

Chameleon: operating system support for dynamic processors

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generation

Journal of Parallel and Distributed Computing
The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution

ACM Transactions on Architecture and Code Optimization (TACO)
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions

Proceedings of the Eleventh ACM International Conference on Embedded Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Thread-level parallelism at the chip level is critical in overcoming some of the challenges that have been ushered in through the advent of modern multicore processors (CMP). Extracting speculatively parallel threads from sequential applications and executing these threads on multicore processors is a promising technique to speed up these applications on multicore systems. However, the potential degradation in energy efficiency associated is an important factor that hinders the deployment of this technique. For multicore systems that integrate same-ISA heterogeneous cores, it is possible to judiciously allocate speculative threads to achieve energy-efficient performance improvement. In this paper, we examine multicore systems with multiple same-ISA heterogeneous cores, some of which supporting simultaneous multithreading. In this environment, we propose thread-allocation mechanisms that dynamically determine how speculative threads are allocated. The proposed mechanisms can potentially allow heterogeneous multicore systems to aim to achieve significant performance improvement with moderate energy increase. At run time, for each segment of speculative parallel execution and sequential execution, the thread-allocation mechanisms make the following three decisions: (i) whether the speculative parallel threads should be deployed to a single core with SMT support or to multiple cores each supporting a single thread of execution; (ii) whether the parallel/sequential threads should utilize more powerful cores with a high issue width or a less powerful core with low issue width; (iii) whether the L1 caches should be fully activated or partially activated. The proposed thread-allocation mechanisms migrate threads and/or re-size L1 caches to maximize energy efficiency (measured in ED2P), based on these decisions. Throttling mechanisms have been incorporated in the proposed system to suppress thread management operations when the performance/energy benefit of these operations cannot justify the associated overhead. By evaluating speculatively parallelized benchmarks from SPEC CPU 2006 and 2000, we found that the proposed heterogeneous multicore system with dynamic thread management is 13% more energy efficient, in terms of ED2P, than the most energy-efficient homogeneous system. This corresponds to 4% performance improvement and 6% reduction in energy consumption. When compare to a four-issue superscalar core that execute the unmodified sequential program with a fixed L1 cache size, the proposed system is 44% more energy efficient, in terms of ED2P. This corresponds to a 38% performance improvement with 6% increase in energy consumption.