ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
An empirical study of decentralized ILP execution models
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
OS and compiler considerations in the design of the IA-64 architecture
ACM SIGPLAN Notices
OS and compiler considerations in the design of the IA-64 architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor
IEEE Micro
Hierarchical Interconnects for On-Chip Clustering
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Architectural Considerations for Application-Specific Counterflow Pipelines
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
SEPAS: a highly accurate energy-efficient branch predictor
Proceedings of the 2004 international symposium on Low power electronics and design
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors
IEEE Transactions on Parallel and Distributed Systems
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency
IEEE Transactions on Computers
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 34th annual international symposium on Computer architecture
Complexity Effective Bypass Networks
Transactions on High-Performance Embedded Architectures and Compilers II
Single FU bypass networks for high clock rate superscalar processors
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Hi-index | 0.00 |
This paper describes the internal organization of the 21264, a 500 MHz, Out-Of-Order, quad-fetch, six-way issue microprocessor. The aggressive cycle-time of the 21264 in combination with many architectural innovations, such as out-of-order and speculative execution, enable this microprocessor to deliver an estimated 30 SpecInt95 and 50 SpecFp95 performance. In addition, the 21264 can sustain 5+ Gigabytes/sec of bandwidth to an L2 cache and 3+ Gigabytes/sec to memory for high performance on memory-intensive applications.