IBM POWER6 microarchitecture

Authors:
H. Q. Le;W. J. Starke;J. S. Fields;F. P. O'Connell;D. Q. Nguyen;B. J. Ronchetti;W. M. Sauer;E. M. Schwarz;M. T. Vaden
Affiliations:
IBM Systems and Technology Group, Austin, Texas;IBM Systems and Technology Group, Austin, Texas;IBM Systems and Technology Group, Austin, Texas;IBM Systems and Technology Group, Austin, Texas;IBM Systems and Technology Group, Austin, Texas;IBM Systems and Technology Group, Austin, Texas;IBM Systems and Technology Group, Austin, Texas;IBM Systems and Technology Group, Poughkeepsie, New York;IBM Systems and Technology Group, Austin, Texas
Venue:
IBM Journal of Research and Development
Year:
2007

Citing 12
Cited 72

The Stanford Dash Multiprocessor

Computer
Organization and implementation of the register-renaming mapper for out-of-order IBM POWER4 processors

IBM Journal of Research and Development - Electrochemical technology in microelectronics
FPU Implementations with Denormalized Numbers

IEEE Transactions on Computers
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
P6 Binary Floating-Point Unit

ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
IBM POWER6 accelerators: VMX and DFU

IBM Journal of Research and Development
IBM POWER6 SRAM arrays

IBM Journal of Research and Development
IBM POWER6 partition mobility: moving virtual servers seamlessly between physical systems

IBM Journal of Research and Development
IBM POWER6 reliability

IBM Journal of Research and Development
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro
POWER3: the next generation of PowerPC processors

IBM Journal of Research and Development
POWER4 system microarchitecture

IBM Journal of Research and Development

End-to-end performance of commercial applications in the face of changing hardware

ACM SIGOPS Operating Systems Review
Sams: single-affiliation multiple-stride parallel memory scheme

Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Software-Controlled Priority Characterization of POWER5 Processor

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A dynamic scheduler for balancing HPC applications

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Soft-error resilience of the IBM POWER6 processor

IBM Journal of Research and Development
Soft-error resilience of the IBM POWER6 processor input/output subsystem

IBM Journal of Research and Development
Phaser: phased methodology for modeling the system-level effects of soft errors

IBM Journal of Research and Development
Notary: Hardware techniques to enhance signatures

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Quantitative analysis of sequence alignment applications on multiprocessor architectures

Proceedings of the 6th ACM conference on Computing frontiers
Stream chaining: exploiting multiple levels of correlation in data prefetching

Proceedings of the 36th annual international symposium on Computer architecture
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Evaluation of the SUN UltraSparc T2+ Processor for Computational Science

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Dynamic power gating with quality guarantees

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
How a Java VM can get more from a hardware performance monitor

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
NUMA-aware memory manager with dominant-thread-based copying GC

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A tagless coherence directory

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Tribeca: design for PVT variations with local recovery and fine-grained adaptation

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Thread to strand binding of parallel network applications in massive multi-threaded systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Modeling and simulating flash based solid-state disks for operating systems

Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
A scalable organization for distributed directories

Journal of Systems Architecture: the EUROMICRO Journal
Adaptive execution techniques of parallel programs for multiprocessors

Journal of Parallel and Distributed Computing
Reusing cached schedules in an out-of-order processor with in-order issue logic

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
The auction: optimizing banks usage in Non-Uniform Cache Architectures

Proceedings of the 24th ACM International Conference on Supercomputing
Cohesion: a hybrid memory model for accelerators

Proceedings of the 37th annual international symposium on Computer architecture
An Adaptive Data Prefetcher for High-Performance Processors

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Design and microarchitecture of the IBM system z10 microprocessor

IBM Journal of Research and Development
IBM system z10 processor cache subsystem microarchitecture

IBM Journal of Research and Development
SIP server performance on multicore systems

IBM Journal of Research and Development
VoIP performance on multicore platforms

IBM Journal of Research and Development
A survey of hardware designs for decimal arithmetic

IBM Journal of Research and Development
Power and thermal characterization of POWER6 system

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing Sparse Data Structures for Matrix-vector Multiply

International Journal of High Performance Computing Applications
Low Power Design for a Multi-core Multi-thread Microprocessor

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Power and energy-aware processor scheduling

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Automatic performance model synthesis from hardware verification models

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Advances in simultaneous multithreading testcase generation methods

HVC'10 Proceedings of the 6th international conference on Hardware and software: verification and testing
Understanding POWER multiprocessors

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Proceedings of the 38th annual international symposium on Computer architecture
Prefetch-aware shared resource management for multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture
IBM POWER7 multicore server processor

IBM Journal of Research and Development
IBM POWER7 systems

IBM Journal of Research and Development
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Filtering directory lookups in CMPs with write-through caches

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Performance characteristics of global high-resolution ocean (MPIOM) and atmosphere (ECHAM6) models on large-scale multicore cluster

PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Filtering directory lookups in CMPs

Microprocessors & Microsystems
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Continuous object access profiling and optimizations to overcome the memory wall and bloat

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Fast poisson solvers for thermal analysis

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Identifying the sources of cache misses in Java programs without relying on hardware counters

Proceedings of the 2012 international symposium on Memory Management
Enhancing the performance of assisted execution runtime systems through hardware/software techniques

Proceedings of the 26th ACM international conference on Supercomputing
Adaptive multi-level compilation in a trace-based Java JIT compiler

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Algorithm-level Feedback-controlled Adaptive data prefetcher: Accelerating data access for high-performance processors

Parallel Computing
Power Aware Meta Scheduler for Adaptive VM Provisioning in IaaS Cloud

International Journal of Cloud Applications and Computing
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Predicting Performance Impact of DVFS for Realistic Memory Systems

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Control-Flow Decoupling

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Reuse-based online models for caches

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Investigating hybrid SSD FTL schemes for Hadoop workloads

Proceedings of the ACM International Conference on Computing Frontiers
NBTI mitigation by optimized NOP assignment and insertion

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
S/DC: a storage and energy efficient data prefetcher

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Virtually split cache: An efficient mechanism to distribute instructions and data

ACM Transactions on Architecture and Code Optimization (TACO)
Modeling the impact of permanent faults in caches

ACM Transactions on Architecture and Code Optimization (TACO)
Applications of the streamed storage format for sparse matrix operations

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the implementation of the IBM POWER6™ microprocessor, a two-way simultaneous multithreaded (SMT) dual-core chip whose key features include binary compatibility with IBM POWER5™ microprocessor-based systems; increased functional capabilities, such as decimal floating-point and vector multimedia extensions; significant reliability, availability, and serviceability enhancements; and robust scalability with up to 64 physical processors. Based on a new industry-leading high-frequency core architecture with enhanced SMT and driven by a high-throughput symmetric multiprocessing (SMP) cache and memory subsystem, the POWER6 chip achieves a significant performance boost compared with its predecessor, the POWER5 chip. Key extensions to the coherence protocol enable POWER6 microprocessor-based systems to achieve better SMP scalability while enabling reductions in system packaging complexity and cost.