Multicore performance optimization using partner cores

Authors:
Eric Lau;Jason E. Miller;Inseok Choi;Donald Yeung;Saman Amarasinghe;Anant Agarwal
Affiliations:
MIT Computer Science and Artificial Intelligence Laboratory;MIT Computer Science and Artificial Intelligence Laboratory;University of Maryland, Dept. of Electrical and Computer Engineering;University of Maryland, Dept. of Electrical and Computer Engineering;MIT Computer Science and Artificial Intelligence Laboratory;MIT Computer Science and Artificial Intelligence Laboratory
Venue:
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Year:
2011

Citing 16
Cited 2

Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design and evaluation of compiler algorithms for pre-execution

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
A study of source-level compiler algorithms for automatic construction of pre-execution code

ACM Transactions on Computer Systems (TOCS)
Conjoined-Core Chip Multiprocessing

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
GPGPU: general purpose computation on graphics hardware

ACM SIGGRAPH 2004 Course Notes
Raksha: a flexible information flow architecture for software security

Proceedings of the 34th annual international symposium on Computer architecture
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments

Proceedings of the 7th international conference on Autonomic computing
Dynamic knobs for responsive power-aware computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems

Self-aware computing in the Angstrom processor

Proceedings of the 49th Annual Design Automation Conference
The autonomic operating system research project: achievements and future directions

Proceedings of the 50th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the push for parallelism continues to increase the number of cores on a chip, system design has become incredibly complex; optimizing for performance and power efficiency is now nearly impossible for the application programmer. To assist the programmer, a variety of techniques for optimizing performance and power at runtime have been developed, but many employ the use of speculative threads or performance counters. These approaches result in stolen cycles, or the use of an extra core, and such expensive penalties can greatly reduce the potential gains. At the same time that general purpose processors have grown larger and more complex, technologies for smaller embedded processors have pushed towards energy efficiency. In this paper, we combine the two and introduce the concept of Partner Cores: low-area, low-power cores paired with larger, faster compute cores. A partner core is tightly coupled to each main processing core, allowing it to perform various optimizations and functions that are impossible on a traditional chip multiprocessor. This paper demonstrates that optimization code running on a partner core can increase performance and provide a net improvement in power efficiency.