A multithreaded PowerPC processor for commercial servers

Authors:
J. M. Borkenhagen;R. J. Eickemeyer;R. N. Kalla;S. R. Kunkel
Affiliations:
IBM Server Group, Rochester, Minnesota;IBM Server Group, Rochester, Minnesota;IBM Server Group, Austin, Texas;IBM Server Group, Rochester, Minnesota
Venue:
IBM Journal of Research and Development
Year:
2000

Citing 11
Cited 28

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluation of multithreaded uniprocessors for commercial application environments

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads

Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
AS/400TM 64-bit PowerPCTM-Compatible Processor Implementaiton

ICCS '94 Proceedings of the1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors
Evaluation of Multithreaded Processors and Thread-Switch Policies

ISHPC '97 Proceedings of the International Symposium on High Performance Computing

Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Benchmarking Internet Servers on Superscalar Machines

Computer
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Real-Time Garbage Collection for a Multithreaded Java Microcontroller

Real-Time Systems
Ambient intelligence: a computational platform perspective

Ambient intelligence
Characterizing a new class of threads in scientific applications for high end supercomputers

Proceedings of the 18th annual international conference on Supercomputing
Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic run-time architecture techniques for enabling continuous optimization

Proceedings of the 2nd conference on Computing frontiers
Improved automatic testcase synthesis for performance model validation

Proceedings of the 19th annual international conference on Supercomputing
Methods for Modeling Resource Contention on Simultaneous Multithreading Processors

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
Operating system exploitation of the POWER5 system

IBM Journal of Research and Development - POWER5 and packaging
Characterization of simultaneous multithreading (SMT) efficiency in POWER5

IBM Journal of Research and Development - POWER5 and packaging
Fairness and Throughput in Switch on Event Multithreading

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Performance of multithreaded chip multiprocessors and implications for operating system design

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Fairness enforcement in switch on event multithreading

ACM Transactions on Architecture and Code Optimization (TACO)
Hard real-time performances in multiprocessor-embedded systems using ASMP-Linux

EURASIP Journal on Embedded Systems - Operating System Support for Embedded Real-Time Applications
A dynamically reconfigurable cache for multithreaded processors

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Service level agreement for multithreaded processors

ACM Transactions on Architecture and Code Optimization (TACO)
Custom circuit design as a driver of microprocessor performance

IBM Journal of Research and Development
A performance methodology for commercial servers

IBM Journal of Research and Development
POWER4 system microarchitecture

IBM Journal of Research and Development
Chip-size evaluation of a multithreaded processor enhanced with a PID controller

SEUS'10 Proceedings of the 8th IFIP WG 10.2 international conference on Software technologies for embedded and ubiquitous systems
A multithreaded multicore system for embedded media processing

Transactions on high-performance embedded architectures and compilers III
Enhancing the performance of assisted execution runtime systems through hardware/software techniques

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the microarchitecture of the RS64 IV, a multithreaded PowerPC® processor, and its memory system. Because this processor is used only in IBM iSeries™ and pSeries™ commercial servers, it is optimized solely for commercial server workloads. Increasing miss rates because of trends in commercial server applications and increasing latency of cache misses because of rapidly increasing clock frequency are having a compounding effect on the portion of execution time that is wasted on cache misses. As a result, several optimizations are included in the processor design to address this problem. The most significant of these is the use of coarse-grained multithreading to enable the processor to perform useful instructions during cache misses. This provides a significant throughput increase while adding less than 5% to the chip area and having very little impact on cycle time. When compared with other performance-improvement techniques, multithreading yields an excellent ratio of performance gain to implementation cost. Second, the miss rate of the L2 cache is reduced by making it four-way associative. Third, the latency of cache-to-cache movement of data is minimized. Fourth, the size of the L1 caches is relatively large. In addition to addressing cache misses, pipeline "holes" caused by branches are minimized with large instruction buffers, large L1 I-cache fetch bandwidth, and optimized resolution of the branch direction. In part, the branches are resolved quickly because of the short but efficient pipeline. To minimize pipeline holes due to data dependencies, the L1 D-cache access is optimized to yield a one-cycle load-to-use penalty.