Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
IBM Journal of Research and Development
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
Fault-tolerant design of the IBM pSeries 690 system using POWER4 processor technology
IBM Journal of Research and Development
Realizing parallelism in database operations: insights from a massively multithreaded architecture
DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
A flexible data to L2 cache mapping approach for future multicore processors
Proceedings of the 2006 workshop on Memory system performance and correctness
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
SPEC CPU2006 sensitivity to memory page sizes
ACM SIGARCH Computer Architecture News
Proceedings of the 34th annual international symposium on Computer architecture
Ginger: control independence using tag rewriting
Proceedings of the 34th annual international symposium on Computer architecture
Feedback-directed thread scheduling with memory considerations
Proceedings of the 16th international symposium on High performance distributed computing
Scalability of the Nutch search engine
Proceedings of the 21st annual international conference on Supercomputing
Multi-core design automation challenges
Proceedings of the 44th annual Design Automation Conference
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Performance modeling for early analysis of multi-core systems
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Design principles for a virtual multiprocessor
Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Characterization of Apache web server with Specweb2005
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
End-to-end performance of commercial applications in the face of changing hardware
ACM SIGOPS Operating Systems Review
A productivity centered application performance tuning framework
Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Sams: single-affiliation multiple-stride parallel memory scheme
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
MPADS: memory-pooling-assisted data splitting
Proceedings of the 7th international symposium on Memory management
IBM Journal of Research and Development
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Software-Controlled Priority Characterization of POWER5 Processor
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Effectiveness of multiple pageable page sizes for commercial applications
Software—Practice & Experience
Hybrid access-specific software cache techniques for the cell BE architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers
Microprocessors & Microsystems
Circuit design and modeling for soft errors
IBM Journal of Research and Development
Stateful hardware decompression in networking environment
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Design optimization of a highly parallel InfiniBand host channel adapter
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture
Languages and Compilers for Parallel Computing
Making secure processors OS- and performance-friendly
ACM Transactions on Architecture and Code Optimization (TACO)
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Implementation and evaluation of a microthread architecture
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Enhancing operating system support for multicore processors by using hardware performance monitoring
ACM SIGOPS Operating Systems Review
Thrifty interconnection network for HPC systems
Proceedings of the 23rd international conference on Supercomputing
Hybrid cache architecture with disparate memory technologies
Proceedings of the 36th annual international symposium on Computer architecture
Adapting application execution in CMPs using helper threads
Journal of Parallel and Distributed Computing
An Efficient Lightweight Shared Cache Design for Chip Multiprocessors
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
L1 Collective Cache: Managing Shared Data for Chip Multiprocessors
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Instruction-level simulation of a cluster at scale
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Reducing leakage power with BTB access prediction
Integration, the VLSI Journal
An hybrid eDRAM/SRAM macrocell to implement first-level data caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Thread to strand binding of parallel network applications in massive multi-threaded systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Minimal data copy for dense linear algebra factorization
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Is cache-oblivious DGEMM viable?
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
MLP-aware dynamic cache partitioning
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Global management of cache hierarchies
Proceedings of the 7th ACM international conference on Computing frontiers
Enigma: architectural and operating system support for reducing the impact of address translation
Proceedings of the 24th ACM International Conference on Supercomputing
Applied inference: Case studies in microarchitectural design
ACM Transactions on Architecture and Code Optimization (TACO)
Introduction to the wire-speed processor and architecture
IBM Journal of Research and Development
Power and thermal characterization of POWER6 system
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Architectural support for user-level network interfaces in heavily virtualized systems
WIOV'10 Proceedings of the 2nd conference on I/O virtualization
Adaptive prefetching for shared cache based chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Design exploration of hybrid caches with disparate memory technologies
ACM Transactions on Architecture and Code Optimization (TACO)
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing Sparse Data Structures for Matrix-vector Multiply
International Journal of High Performance Computing Applications
Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
A workload-adaptive and reconfigurable bus architecture for multicore processors
International Journal of Reconfigurable Computing
Research note: C-AMTE: A location mechanism for flexible cache management in chip multiprocessors
Journal of Parallel and Distributed Computing
Dynamic cache partitioning based on the MLP of cache misses
Transactions on high-performance embedded architectures and compilers III
Green secure processors: towards power-efficient secure processor design
Transactions on computational science X
On the theory and potential of LRU-MRU collaborative cache management
Proceedings of the international symposium on Memory management
Understanding POWER multiprocessors
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Cache injection for parallel applications
Proceedings of the 20th international symposium on High performance distributed computing
Proceedings of the 38th annual international symposium on Computer architecture
Proceedings of the 38th annual international symposium on Computer architecture
IBM POWER7 multicore server processor
IBM Journal of Research and Development
IBM Journal of Research and Development
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Trace-Based data layout optimizations for multi-core processors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
A generalized theory of collaborative caching
Proceedings of the 2012 international symposium on Memory Management
Enhancing the performance of assisted execution runtime systems through hardware/software techniques
Proceedings of the 26th ACM international conference on Supercomputing
XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
DIMSim: a rapid two-level cache simulation approach for deadline-based MPSoCs
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Power Aware Meta Scheduler for Adaptive VM Provisioning in IaaS Cloud
International Journal of Cloud Applications and Computing
Pacman: program-assisted cache management
Proceedings of the 2013 international symposium on memory management
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU
Proceedings of the 40th Annual International Symposium on Computer Architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
On-chip ring network designs for hard-real time systems
Proceedings of the 21st International conference on Real-Time Networks and Systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Virtually split cache: An efficient mechanism to distribute instructions and data
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting replication to improve performances of NUCA-based CMP systems
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Applications of the streamed storage format for sparse matrix operations
International Journal of High Performance Computing Applications
Hi-index | 0.01 |
This paper describes the implementation of the IBM POWER5TM chip, a two-way simultaneous multithreaded dual-core chip, and systems based on it. With a key goal of maintaining both binary and structural compatibility with POWER4TM systems, the POWER5 microprocessor allows system scalability to 64 physical processors. A POWER5 system allows both single-threaded and multithreaded execution modes. In single-threaded execution mode, a POWER5 system allows for higher performance than its predecessor POWER4 system at equivalent frequencies. In multithreaded execution mode, the POWER5 microprocessor implements dynamic resource balancing to ensure that each thread receives its fair share of system resources. Additionally, software-settable thread priority is enforced by the POWER5 hardware. To conserve power, the POWER5 chip implements dynamic power management that allows reduced power consumption without affecting performance.