The Stanford Dash Multiprocessor
Computer
IBM Journal of Research and Development - Electrochemical technology in microelectronics
FPU Implementations with Denormalized Numbers
IEEE Transactions on Computers
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
IBM POWER6 accelerators: VMX and DFU
IBM Journal of Research and Development
IBM Journal of Research and Development
IBM POWER6 partition mobility: moving virtual servers seamlessly between physical systems
IBM Journal of Research and Development
IBM Journal of Research and Development
POWER3: the next generation of PowerPC processors
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
End-to-end performance of commercial applications in the face of changing hardware
ACM SIGOPS Operating Systems Review
Sams: single-affiliation multiple-stride parallel memory scheme
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Software-Controlled Priority Characterization of POWER5 Processor
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A dynamic scheduler for balancing HPC applications
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Soft-error resilience of the IBM POWER6 processor
IBM Journal of Research and Development
Soft-error resilience of the IBM POWER6 processor input/output subsystem
IBM Journal of Research and Development
Phaser: phased methodology for modeling the system-level effects of soft errors
IBM Journal of Research and Development
Notary: Hardware techniques to enhance signatures
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Prefetch-Aware DRAM Controllers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Quantitative analysis of sequence alignment applications on multiprocessor architectures
Proceedings of the 6th ACM conference on Computing frontiers
Stream chaining: exploiting multiple levels of correlation in data prefetching
Proceedings of the 36th annual international symposium on Computer architecture
Scaling the bandwidth wall: challenges in and avenues for CMP scaling
Proceedings of the 36th annual international symposium on Computer architecture
A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Evaluation of the SUN UltraSparc T2+ Processor for Computational Science
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Dynamic power gating with quality guarantees
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
How a Java VM can get more from a hardware performance monitor
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
NUMA-aware memory manager with dominant-thread-based copying GC
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Tribeca: design for PVT variations with local recovery and fine-grained adaptation
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Thread to strand binding of parallel network applications in massive multi-threaded systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Modeling and simulating flash based solid-state disks for operating systems
Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
A scalable organization for distributed directories
Journal of Systems Architecture: the EUROMICRO Journal
Adaptive execution techniques of parallel programs for multiprocessors
Journal of Parallel and Distributed Computing
Reusing cached schedules in an out-of-order processor with in-order issue logic
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
The auction: optimizing banks usage in Non-Uniform Cache Architectures
Proceedings of the 24th ACM International Conference on Supercomputing
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
An Adaptive Data Prefetcher for High-Performance Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Design and microarchitecture of the IBM system z10 microprocessor
IBM Journal of Research and Development
IBM system z10 processor cache subsystem microarchitecture
IBM Journal of Research and Development
SIP server performance on multicore systems
IBM Journal of Research and Development
VoIP performance on multicore platforms
IBM Journal of Research and Development
A survey of hardware designs for decimal arithmetic
IBM Journal of Research and Development
Power and thermal characterization of POWER6 system
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing Sparse Data Structures for Matrix-vector Multiply
International Journal of High Performance Computing Applications
Low Power Design for a Multi-core Multi-thread Microprocessor
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Power and energy-aware processor scheduling
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Automatic performance model synthesis from hardware verification models
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Advances in simultaneous multithreading testcase generation methods
HVC'10 Proceedings of the 6th international conference on Hardware and software: verification and testing
Understanding POWER multiprocessors
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Prefetch-aware shared resource management for multi-core systems
Proceedings of the 38th annual international symposium on Computer architecture
IBM POWER7 multicore server processor
IBM Journal of Research and Development
IBM Journal of Research and Development
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Filtering directory lookups in CMPs with write-through caches
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Filtering directory lookups in CMPs
Microprocessors & Microsystems
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Continuous object access profiling and optimizations to overcome the memory wall and bloat
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Fast poisson solvers for thermal analysis
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Identifying the sources of cache misses in Java programs without relying on hardware counters
Proceedings of the 2012 international symposium on Memory Management
Enhancing the performance of assisted execution runtime systems through hardware/software techniques
Proceedings of the 26th ACM international conference on Supercomputing
Adaptive multi-level compilation in a trace-based Java JIT compiler
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Power Aware Meta Scheduler for Adaptive VM Provisioning in IaaS Cloud
International Journal of Cloud Applications and Computing
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Predicting Performance Impact of DVFS for Realistic Memory Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Reuse-based online models for caches
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Investigating hybrid SSD FTL schemes for Hadoop workloads
Proceedings of the ACM International Conference on Computing Frontiers
NBTI mitigation by optimized NOP assignment and insertion
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
S/DC: a storage and energy efficient data prefetcher
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Virtually split cache: An efficient mechanism to distribute instructions and data
ACM Transactions on Architecture and Code Optimization (TACO)
Modeling the impact of permanent faults in caches
ACM Transactions on Architecture and Code Optimization (TACO)
Applications of the streamed storage format for sparse matrix operations
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
This paper describes the implementation of the IBM POWER6™ microprocessor, a two-way simultaneous multithreaded (SMT) dual-core chip whose key features include binary compatibility with IBM POWER5™ microprocessor-based systems; increased functional capabilities, such as decimal floating-point and vector multimedia extensions; significant reliability, availability, and serviceability enhancements; and robust scalability with up to 64 physical processors. Based on a new industry-leading high-frequency core architecture with enhanced SMT and driven by a high-throughput symmetric multiprocessing (SMP) cache and memory subsystem, the POWER6 chip achieves a significant performance boost compared with its predecessor, the POWER5 chip. Key extensions to the coherence protocol enable POWER6 microprocessor-based systems to achieve better SMP scalability while enabling reductions in system packaging complexity and cost.