The evolution of RISC technology at IBM
IBM Journal of Research and Development
The IBM RISC System/6000 processor: hardware overview
IBM Journal of Research and Development
A comparison of dynamic branch predictors that use two levels of branch history
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The PowerPC architecture: a specification for a new family of RISC processors
The PowerPC architecture: a specification for a new family of RISC processors
POWER2: next generation of the RISC System/6000 family
IBM Journal of Research and Development
Design considerations for the PowerPC 601 microprocessor
IBM Journal of Research and Development
Target prediction for indirect jumps
Proceedings of the 24th annual international symposium on Computer architecture
The Alpha 21264 Microprocessor
IEEE Micro
POWER3: the next generation of PowerPC processors
IBM Journal of Research and Development
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
The circuit and physical design of the POWER4 microprocessor
IBM Journal of Research and Development
Fault-tolerant design of the IBM pSeries 690 system using POWER4 processor technology
IBM Journal of Research and Development
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Tradeoffs in power-efficient issue queue design
Proceedings of the 2002 international symposium on Low power electronics and design
An adaptive chip-multiprocessor architecture for future mobile terminals
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
A comparative study of arbitration algorithms for the Alpha 21364 pipelined router
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Interface Design Techniques for Single-Chip Systems
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Energy efficient co-adaptive instruction fetch and issue
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Adaptive and efficient abortable mutual exclusion
Proceedings of the twenty-second annual symposium on Principles of distributed computing
LLVA: A Low-level Virtual Instruction Set Architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Universal Mechanisms for Data-Parallel Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scalable Hardware Memory Disambiguation for High ILP Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Reducing Design Complexity of the Load/Store Queue
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Control Transfers in Code Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
An advanced multichip module (MCM) for high-performance UNIX servers
IBM Journal of Research and Development
IEEE Transactions on Parallel and Distributed Systems
Microarchitectural techniques for power gating of execution units
Proceedings of the 2004 international symposium on Low power electronics and design
IBM Journal of Research and Development
Adaptive History-Based Memory Schedulers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Data Centric Cache Measurement on the Intel ltanium 2 Processor
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
IBM Journal of Research and Development - Electrochemical technology in microelectronics
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploiting Structural Duplication for Lifetime Reliability Enhancement
Proceedings of the 32nd annual international symposium on Computer Architecture
High-Performance Throughput Computing
IEEE Micro
WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing
International Journal of High Performance Computing Applications
A Linux-based tool for hardware bring up, Linux development, and manufacturing
IBM Systems Journal
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Improved automatic testcase synthesis for performance model validation
Proceedings of the 19th annual international conference on Supercomputing
Towards a cross-platform microbenchmark suite for evaluating hardware performance counter data
Proceedings of the 2005 conference on Diversity in computing
Formal Verification and its Impact on the Snooping versus Directory Protocol Debate
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism.
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Performance implications of single thread migration on a chip multi-core
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
On the importance of optimizing the configuration of stream prefetchers
Proceedings of the 2005 workshop on Memory system performance
Constructing Virtual Architectures on a Tiled Processor
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
Kilo-instruction processors, runahead and prefetching
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the 33rd annual international symposium on Computer Architecture
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
IBM Journal of Research and Development
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Advanced virtualization capabilities of POWER5 systems
IBM Journal of Research and Development - POWER5 and packaging
Operating system exploitation of the POWER5 system
IBM Journal of Research and Development - POWER5 and packaging
Functional verification of the POWER5 microprocessor and POWER5 multiprocessor systems
IBM Journal of Research and Development - POWER5 and packaging
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
SEED: scalable, efficient enforcement of dependences
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Dense Gaussian networks: suitable topologies for on-chip multiprocessors
International Journal of Parallel Programming
Reducing cache traffic and energy with macro data load
Proceedings of the 2006 international symposium on Low power electronics and design
Substituting associative load queue with simple hash tables in out-of-order microprocessors
Proceedings of the 2006 international symposium on Low power electronics and design
Supporting microthread scheduling and synchronisation in CMPs
International Journal of Parallel Programming
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
DMDC: Delayed Memory Dependence Checking through Age-Based Filtering
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
Making the fast case common and the uncommon case simple in unbounded transactional memory
Proceedings of the 34th annual international symposium on Computer architecture
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 34th annual international symposium on Computer architecture
Late-binding: enabling unordered load-store queues
Proceedings of the 34th annual international symposium on Computer architecture
Automated design of application specific superscalar processors: an analytical approach
Proceedings of the 34th annual international symposium on Computer architecture
VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization
Proceedings of the 34th annual international symposium on Computer architecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
A study of thread migration in temperature-constrained multicores
ACM Transactions on Architecture and Code Optimization (TACO)
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
Multi-core design automation challenges
Proceedings of the 44th annual Design Automation Conference
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Performance modeling for early analysis of multi-core systems
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Design principles for a virtual multiprocessor
Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
A Decimal Floating-Point Divider Using Newton---Raphson Iteration
Journal of VLSI Signal Processing Systems
Memory scheduling for modern microprocessors
ACM Transactions on Computer Systems (TOCS)
Characterization of Apache web server with Specweb2005
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Exploring power management in multi-core systems
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
Hard real-time performances in multiprocessor-embedded systems using ASMP-Linux
EURASIP Journal on Embedded Systems - Operating System Support for Embedded Real-Time Applications
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Sams: single-affiliation multiple-stride parallel memory scheme
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
IBM Journal of Research and Development
A Two-Level Load/Store Queue Based on Execution Locality
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Low-Cost Adaptive Data Prefetching
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Soft-error resilience of the IBM POWER6 processor
IBM Journal of Research and Development
Stateful hardware decompression in networking environment
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Using age registers for a simple load-store queue filtering
Journal of Systems Architecture: the EUROMICRO Journal
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Power supply noise aware workload assignment for multi-core systems
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Prefetch-Aware DRAM Controllers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A performance-correctness explicitly-decoupled architecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Memory mapped ECC: low-cost error protection for last level caches
Proceedings of the 36th annual international symposium on Computer architecture
InvisiFence: performance-transparent memory ordering in conventional multiprocessors
Proceedings of the 36th annual international symposium on Computer architecture
Instruction-level simulation of a cluster at scale
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Flexible cache error protection using an ECC FIFO
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The circuit and physical design of the POWER4 microprocessor
IBM Journal of Research and Development
Functional verification of the POWER4 microprocessor and POWER4 multiprocessor systems
IBM Journal of Research and Development
An hybrid eDRAM/SRAM macrocell to implement first-level data caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Research on synthesis parameter real-time scheduling algorithm on multi-core architecture
CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Evaluation of OpenMP for the cyclops multithreaded architecture
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Early-stage definition of LPX: a low power issue-execute processor
PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems
PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems
Decoupled state-execute architecture
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Out-of-order issue logic using sorting networks
Proceedings of the 20th symposium on Great lakes symposium on VLSI
rMPI: message passing on multicore processors with on-chip interconnect
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Global management of cache hierarchies
Proceedings of the 7th ACM international conference on Computing frontiers
Timing local streams: improving timeliness in data prefetching
Proceedings of the 24th ACM International Conference on Supercomputing
Data marshaling for multi-core architectures
Proceedings of the 37th annual international symposium on Computer architecture
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
IBM Journal of Research and Development
Improving yield and reliability of chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing Sparse Data Structures for Matrix-vector Multiply
International Journal of High Performance Computing Applications
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
Proceedings of the 38th annual international symposium on Computer architecture
IBM POWER7 multicore server processor
IBM Journal of Research and Development
IBM Journal of Research and Development
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
CRUISE: cache replacement and utility-aware scheduling
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
A power-efficient and scalable load-store queue design
PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
An optimized front-end physical register file with banking and writeback filtering
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
ACM Transactions on Computer Systems (TOCS)
Paradigmatic shifts for exascale supercomputing
The Journal of Supercomputing
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Predicting Performance Impact of DVFS for Realistic Memory Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Low-Latency Mechanisms for Near-Threshold Operation of Private Caches in Shared Memory Multicores
MICROW '12 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU
Proceedings of the 40th Annual International Symposium on Computer Architecture
Journal of Parallel and Distributed Computing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A cylinder computation model for many-core parallel computing
Theoretical Computer Science
On-chip ring network designs for hard-real time systems
Proceedings of the 21st International conference on Real-Time Networks and Systems
Virtually split cache: An efficient mechanism to distribute instructions and data
ACM Transactions on Architecture and Code Optimization (TACO)
Applications of the streamed storage format for sparse matrix operations
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
The IBM POWER4 is a new microprocessor organized in a system structure that includes new technology to form systems. The name POWER4 as used in this context refers not only to a chip, but also to the structure used to interconnect chips to form systems. In this paper we describe the processor microarchitecture as well as the interconnection architecture employed to form systems up to a 32-way symmetric multiprocessor.