Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
Bytecode fetch optimization for a Java interpreter
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
ACM Transactions on Mathematical Software (TOMS)
IBM Journal of Research and Development
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
Multi-functional floating-point MAF designs with dot product support
Microelectronics Journal
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
Optimizing Sparse Data Structures for Matrix-vector Multiply
International Journal of High Performance Computing Applications
Bridge floating-point fused multiply-add design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Speculative hardware/software co-designed floating-point multiply-add fusion
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Applications of the streamed storage format for sparse matrix operations
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
The POWER3 processor is a high-performance microprocessor which excels at technical computing. Designed by IBM and deployed in various IBM RS/6000® systems, the superscalar RISC POWER3 processor boasts many advanced features which give it exceptional performance on challenging applications from the workstation to the supercomputer level. In this paper, we describe the microarchitectural features of the POWER3 processor, particularly those which are unique or significant to the performance of the chip, such as the data prefetch engine, nonblocking and interleaved data cache, and dual multiply-add-fused floating-point execution units. Additionally, the performance of specific instruction sequences and kernels is described to quantify and further illuminate the performance attributes of the POWER3 processor.