Understanding fundamental design choices in single-ISA heterogeneous multicore architectures

Authors:
Kenzo Van Craeynest;Lieven Eeckhout
Affiliations:
Ghent University, Belgium;Ghent University, Belgium
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Year:
2013

Citing 31
Cited 1

Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Best of Both Latency and Throughput

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Scheduling for heterogeneous processors in server systems

Proceedings of the 2nd conference on Computing frontiers
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Mitigating Amdahl's Law through EPI Throttling

Proceedings of the 32nd annual international symposium on Computer Architecture
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Core architecture optimization for heterogeneous chip multiprocessors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Thousand core chips: a technology perspective

Proceedings of the 44th annual Design Automation Conference
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
System-Level Performance Metrics for Multiprogram Workloads

IEEE Micro
Amdahl's Law in the Multicore Era

Computer
CPR: Composable performance regression for scalable multiprocessor models

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
HASS: a scheduler for heterogeneous multicore systems

ACM SIGOPS Operating Systems Review
Efficient program scheduling for heterogeneous multi-core processors

Proceedings of the 46th Annual Design Automation Conference
Evaluation techniques for storage hierarchies

IBM Systems Journal
Bias scheduling in heterogeneous multi-core architectures

Proceedings of the 5th European conference on Computer systems
Web search using mobile cores: quantifying and mitigating the price of efficiency

Proceedings of the 37th annual international symposium on Computer architecture
Challenges and Opportunities for Extremely Energy-Efficient Processors

IEEE Micro
Scalable thread scheduling and global power management for heterogeneous many-core architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Fast modeling of shared caches in multicore systems

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE)

Proceedings of the 39th Annual International Symposium on Computer Architecture
The Multi-Program Performance Model: Debunking current practice in multi-core simulation

IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Single-ISA heterogeneous multicore processors have gained substantial interest over the past few years because of their power efficiency, as they offer the potential for high overall chip throughput within a given power budget. Prior work in heterogeneous architectures has mainly focused on how heterogeneity can improve overall system throughput. To what extent heterogeneity affects per-program performance has remained largely unanswered. In this article, we aim at understanding how heterogeneity affects both chip throughput and per-program performance; how heterogeneous architectures compare to homogeneous architectures under both performance metrics; and how fundamental design choices, such as core type, cache size, and off-chip bandwidth, affect performance. We use analytical modeling to explore a large space of single-ISA heterogeneous architectures. The analytical model has linear-time complexity in the number of core types and programs of interest, and offers a unique opportunity for exploring the large space of both homogeneous and heterogeneous multicore processors in limited time. Our analysis provides several interesting insights: While it is true that heterogeneity can improve system throughput, it fundamentally trades per-program performance for chip throughput; although some heterogeneous configurations yield better throughput and per-program performance than homogeneous designs, some homogeneous configurations are optimal for particular throughput versus per-program performance trade-offs. Two core types provide most of the benefits from heterogeneity and a larger number of core types does not contribute much; job-to-core mapping is both important and challenging for heterogeneous multicore processors to achieve optimum performance. Limited off-chip bandwidth does alter some of the fundamental design choices in heterogeneous multicore architectures, such as the need for large on-chip caches for achieving high throughput, and per-program performance degrading more relative to throughput under constrained off-chip bandwidth.