Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications

Authors:
Hongtao Zhong;Steven A. Lieberman;Scott A. Mahlke
Affiliations:
Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI 48109. hongtaoz@umich.edu;Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI 48109. lieberm@umich.edu;Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor, MI 48109. mahlke@umich.edu
Venue:
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Year:
2007

Citing 0
Cited 20

Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Implicitly parallel programming models for thousand-core microprocessors

Proceedings of the 44th annual Design Automation Conference
Federation: repurposing scalar cores for out-of-order instruction issue

Proceedings of the 45th annual Design Automation Conference
Multitasking workload scheduling on flexible core chip multiprocessors

ACM SIGARCH Computer Architecture News
Core cannibalization architecture: improving lifetime chip performance for multicore processors in the presence of hard faults

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Multitasking workload scheduling on flexible-core chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Strategies for mapping dataflow blocks to distributed hardware

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Proceedings of the 36th annual international symposium on Computer architecture
Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Supervised learning based power management for multicore processors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Federation: Boosting per-thread performance of throughput-oriented manycore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
CoreSymphony: an efficient reconfigurable multi-core architecture

ACM SIGARCH Computer Architecture News
Bahurupi: A polymorphic heterogeneous multi-core architecture

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
CRQ-based fair scheduling on composable multicore architectures

Proceedings of the 26th ACM international conference on Supercomputing
Harmony: collection and analysis of parallel block vectors

Proceedings of the 39th Annual International Symposium on Computer Architecture
Interactive physical simulation on multicore architectures

EG PGV'09 Proceedings of the 9th Eurographics conference on Parallel Graphics and Visualization
Survey of Low-Energy Techniques for Instruction Memory Organisations in Embedded Systems

Journal of Signal Processing Systems
DRMA: dynamically reconfigurable MPSoC architecture

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Design Space Exploration of Distributed Loop Buffer Architectures with Incompatible Loop-Nest Organisations in Embedded Systems

Journal of Signal Processing Systems
A hyperscalar dual-core architecture for embedded systems

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip multiprocessors with multiple simpler cores are gaining popularity because they have the potential to drive future performance gains without exacerbating the problems of power dissipation and complexity. Current chip multiprocessors increase throughput by utilizing multiple cores to perform computation in parallel. These designs provide real benefits for server-class applications that are explicitly multi-threaded. However, for desktop and other systems where single-thread applications dominate, multicore systems have yet to offer much benefit. Chip multiprocessors are most efficient at executing coarse-grain threads that have little communication. However, general-purpose applications do not provide many opportunities for identifying such threads, due to frequent use of pointers, recursive data structures, if-then-else branches, small function bodies, and loops with small trip counts. To attack this mismatch, this paper proposes a multicore architecture, referred to as Voltron, that extends traditional multicore systems in two ways. First, it provides a dual-mode scalar operand network to enable efficient inter-core communication and lightweight synchro synchronization. Second, Voltron can organize the cores for execution in either coupled or decoupled mode. In coupled mode, the cores execute multiple instruction streams in lock-step to collectively function as a wide-issue VLIW. In decoupled mode, the cores execute a set of fine-grain communicating threads extracted by the compiler. This paper describes the Voltron architecture and associated compiler support for orchestrating bi-modal execution.