High-Performance Throughput Computing

Authors:
Shailender Chaudhry;Paul Caprioli;Sherman Yip;Marc Tremblay
Affiliations:
Sun Microsystems;Sun Microsystems;Sun Microsystems;Sun Microsystems
Venue:
IEEE Micro
Year:
2005

Citing 15
Cited 18

Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Thread-level parallelism and interactive performance of desktop applications

ACM SIGPLAN Notices
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamically allocating processor resources between nearby and distant ILP

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MAJC Architecture: A Synthesis of Parallelism and Scalability

IEEE Micro
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design and implementation of the POWER5™ microprocessor

Proceedings of the 41st annual Design Automation Conference
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Out-of-Order Commit Processors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A performance methodology for commercial servers

IBM Journal of Research and Development
POWER4 system microarchitecture

IBM Journal of Research and Development

Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Performance/Watt: the new server focus

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Fast synchronization for chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

IEEE Micro
Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring Large-Scale CMP Architectures Using ManySim

IEEE Micro
An analytical model of locality-based parallel irregular reductions

Parallel Computing
Corona: System Implications of Emerging Nanophotonic Technology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

Proceedings of the 36th annual international symposium on Computer architecture
Performance impact of resource conflicts on chip multi-processor servers

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Efficient runahead threads

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transactional memories for multi-processor FPGA platforms

Journal of Systems Architecture: the EUROMICRO Journal
Performance limitations of block-multithreaded distributed-memory systems

Winter Simulation Conference
OUTRIDER: efficient memory latency tolerance with decoupled strands

Proceedings of the 38th annual international symposium on Computer architecture
Performance evaluation of a chip-multithreading server for high performance computing applications

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Throughput computing with chip multithreading and clusters

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Wimpy or brawny cores: A throughput perspective

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Throughput computing, achieved through multithreading and multicore technology, can lead to performance improvements that are 10 to 30脳 those of conventional processors and systems. However, such systems should also offer good single-thread performance. Here, the authors show that hardware scouting increases the performance of an already robust core by up to 40 percent for commercial benchmarks.