Composable Lightweight Processors

Authors:
Changkyu Kim;Simha Sethumadhavan;M. S. Govindan;Nitya Ranganathan;Divya Gulati;Doug Burger;Stephen W. Keckler
Affiliations:
-;-;-;-;-;-;-
Venue:
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2007

Citing 0
Cited 34

Counting Dependence Predictors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Federation: repurposing scalar cores for out-of-order instruction issue

Proceedings of the 45th annual Design Automation Conference
Multitasking workload scheduling on flexible core chip multiprocessors

ACM SIGARCH Computer Architecture News
Multitasking workload scheduling on flexible-core chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Adapting Application Mapping to Systematic Within-Die Process Variations on Chip Multiprocessors

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
An evaluation of the TRIPS computer system

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Strategies for mapping dataflow blocks to distributed hardware

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient program scheduling for heterogeneous multi-core processors

Proceedings of the 46th Annual Design Automation Conference
Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proposition for a sequential accelerator in future general-purpose manycore processors and the problem of migration-induced cache misses

Proceedings of the 7th ACM international conference on Computing frontiers
WiDGET: Wisconsin decoupled grid execution tiles

Proceedings of the 37th annual international symposium on Computer architecture
Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture
Resource recycling: putting idle resources to work on a composable accelerator

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Federation: Boosting per-thread performance of throughput-oriented manycore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic vectorization in the E2 dynamic multicore architecture

ACM SIGARCH Computer Architecture News
Erasing Core Boundaries for Robust and Configurable Performance

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
CoreSymphony: an efficient reconfigurable multi-core architecture

ACM SIGARCH Computer Architecture News
Bahurupi: A polymorphic heterogeneous multi-core architecture

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Global register alias table: Boosting sequential program on multi-core

Future Generation Computer Systems
Dynamic resource tuning for flexible core chip multiprocessors

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Distributed replay protocol for distributed uniprocessors

Proceedings of the 26th ACM international conference on Supercomputing
Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture

Proceedings of the 26th ACM international conference on Supercomputing
CRQ-based fair scheduling on composable multicore architectures

Proceedings of the 26th ACM international conference on Supercomputing
Power Limitations and Dark Silicon Challenge the Future of Multicore

ACM Transactions on Computer Systems (TOCS)
Harmony: collection and analysis of parallel block vectors

Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Power challenges may end the multicore era

Communications of the ACM
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Composite Cores: Pushing Heterogeneity Into a Core

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
DRMA: dynamically reconfigurable MPSoC architecture

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions

Proceedings of the Eleventh ACM International Conference on Embedded Software

Quantified Score

Hi-index	0.02

Visualization

Abstract

Modern chip multiprocessors (CMPs) are designed to exploit both instruction-level parallelism (ILP) within pro- cessors and thread-level parallelism (TLP) within and across processors. However, the number of processors and the granularity of each processor are fixed at de- sign time. This paper evaluates a flexible architectural approach, called Composable Lightweight Processors (or CLPs), that allows simple, low-power cores to be aggre- gated together dynamically, forming larger, more powerful single-threaded processors without changing the applica- tion binary. We evaluate one such design with 32 cores called TFlex, which can be configured as 32 dual-issue pro- cessors, or as a single 64-wide issue processor, or as any point in between. Use of an Explicit Data Graph Execution (EDGE) ISA enables the system to be fully composable, with no monolithic structures spanning the cores. Simulation re- sults show that CLPs achieve an average performance boost of 42%, an average area-efficiency of 3.4x, and an average power-efficiency of 2x over a fixed architecture on a spec- trum of single-threaded applications. Results also show that CLPs outperform a spectrum of fixed CMP architectures on a set of multitasking workloads.