ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
IEEE Micro
A Thermal-Aware Superscalar Microprocessor
ISQED '02 Proceedings of the 3rd International Symposium on Quality Electronic Design
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Reducing power density through activity migration
Proceedings of the 2003 international symposium on Low power electronics and design
Circuit and microarchitectural techniques for reducing cache leakage power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Mitigating Amdahl's Law through EPI Throttling
Proceedings of the 32nd annual international symposium on Computer Architecture
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing the Latency and Area Cost of Core Swapping through Shared Helper Engines
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Heterogeneous Chip Multiprocessors
Computer
Performance implications of single thread migration on a chip multi-core
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors
IEEE Computer Architecture Letters
Techniques for Multicore Thermal Management: Classification and New Exploration
Proceedings of the 33rd annual international symposium on Computer Architecture
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
A study of thread migration in temperature-constrained multicores
ACM Transactions on Architecture and Code Optimization (TACO)
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Paceline: Improving Single-Thread Performance in Nanoscale CMPs through Core Overclocking
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Composable Lightweight Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
The shared-thread multiprocessor
Proceedings of the 22nd annual international conference on Supercomputing
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Thread motion: fine-grained power management for multi-core systems
Proceedings of the 36th annual international symposium on Computer architecture
The BubbleWrap many-core: popping cores for sequential acceleration
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Understanding the Thermal Implications of Multi-Core Architectures
IEEE Transactions on Parallel and Distributed Systems
Global register alias table: Boosting sequential program on multi-core
Future Generation Computer Systems
Hi-index | 0.00 |
As the number of transistors on a chip doubles with every technology generation, the number of on-chip cores also increases rapidly, making possible in a foreseeable future to design processors featuring hundreds of general-purpose cores. However, though a large number of cores speeds up parallel code sections, Amdahl's law requires speeding up sequential sections too. We argue that it will become possible to dedicate a substantial fraction of the chip area and power budget to achieve high sequential performance. Current general-purpose processors contain a handful of cores designed to be continuously active and run in parallel. This leads to power and thermal constraints that limit the core's performance. We propose removing these constraints with a sequential accelerator (SACC). A SACC consists of several cores designed for ultimate sequential performance. These cores cannot run continuously. A single core is active at any time, the rest of the cores are inactive and power-gated. We migrate the execution periodically to another core to spread heat generation uniformly over the whole SACC area, thus addressing the temperature issue. The SACC will be viable only if it yields significant sequential performance. Migration-induced cache misses may limit performance gains. We propose some solutions to mitigate this problem. We also investigate a migration method using thermal sensors, such that the migration interval depends on the ambient temperature and the migration penalty is negligible under normal thermal conditions.