Advanced performance features of the 64-bit PA-8000

Authors:
D. Hunt
Affiliations:
-
Venue:
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Year:
1995

Citing 0
Cited 36

Improving cache performance with balanced tag and data paths

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Area and performance tradeoffs in floating-point divide and square-root implementations

ACM Computing Surveys (CSUR)
Instruction scheduling for the HP PA-8000

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Aggressive inlining

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs

ICS '97 Proceedings of the 11th international conference on Supercomputing
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Designing high bandwidth on-chip caches

Proceedings of the 24th annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Data prefetching on the HP PA-8000

Proceedings of the 24th annual international symposium on Computer architecture
On high-bandwidth data cache design for multi-issue processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The design and performance of a conflict-avoiding cache

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Prediction caches for superscalar processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Division Algorithms and Implementations

IEEE Transactions on Computers
Characterizing Distributed Shared Memory Performance: A Case Study of the Convex SPP1000

IEEE Transactions on Parallel and Distributed Systems
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance

Proceedings of the 25th annual international symposium on Computer architecture
Randomized Cache Placement for Eliminating Conflicts

IEEE Transactions on Computers - Special issue on cache memory and related problems
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Instruction fetch mechanisms for multipath execution processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Access region locality for high-bandwidth processor memory system design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance

ACM Transactions on Computer Systems (TOCS)
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A High-Bandwidth Memory Pipeline for Wide Issue Processors

IEEE Transactions on Computers
When Caches Aren't Enough: Data Prefetching Techniques

Computer
Accelerating Multimedia with Enhanced Microprocessors

IEEE Micro
VIS Speeds New Media Processing

IEEE Micro
Subword Parallelism with MAX-2

IEEE Micro
The Design Space of Register Renaming Techniques

IEEE Micro
Architectural Considerations for Application-Specific Counterflow Pipelines

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Mid-Range and High-End PA RISC Computer Systems

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
64-bit and Multimedia Extensions in the PA-RISC 2.0 Architecture

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Instruction-level parallel processors-dynamic and static scheduling tradeoffs

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
References

Sourcebook of parallel computing
Bridge floating-point fused multiply-add design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

The PA-8000 is Hewlett-Packard's first CPU to implement the new 64-bit PA2.0 architecture. It combines a high clock frequency with a number of advanced microarchitectural features to deliver industry-leading performance on commercial and technical applications while maintaining full compatibility with all previous PA-RISC binaries. Among these advanced features are a fifty-six entry instruction reorder buffer to support out-of-order execution, a branch target address cache, branch history table, support for multiple outstanding cache misses and dual integer load/store, floating point multiply/accumulate, and divide/square root units which allow execution of four instructions per cycle. Together these features will enable the PA-8000 to sustain superscalar operation on a wide variety of workloads.