The Alpha 21264 Microprocessor

Authors:
R. E. Kessler
Affiliations:
-
Venue:
IEEE Micro
Year:
1999

Citing 4
Cited 264

Superscalar Instruction Execution in the 21164 Alpha Microprocessor

IEEE Micro
The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor

COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Circuit Implementation of a 600MHz Superscalar RISC Microprocessor

ICCD '98 Proceedings of the International Conference on Computer Design

Delaying physical register allocation through virtual-physical registers

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The use of multithreading for exception handling

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

International Journal of Parallel Programming
Performance analysis of the Alpha 21264-based Compaq ES40 system

Proceedings of the 27th annual international symposium on Computer architecture
Circuits for wide-window superscalar processors

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Optimization of high-performance superscalar architectures for energy efficiency

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Architecture and design of AlphaServer GS320

ACM SIGPLAN Notices
The impact of delay on the design of branch predictors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A circuit level implementation of an adaptive issue queue for power-aware microprocessors

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Inherently Lower-Power High-Performance Superscalar Architectures

IEEE Transactions on Computers
Architecture and design of AlphaServer GS320

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamically allocating processor resources between nearby and distant ILP

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Measuring experimental error in microprocessor simulation

SSR '01 Proceedings of the 2001 symposium on Software reusability: putting software reuse in context
Energy reduction in queues and stacks by adaptive bitwidth compression

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Choosing representative slices of program execution for microarchitecture simulations: a preliminary application to the data stream

Workload characterization of emerging computer applications
A flexible accelerator for layer 7 networking applications

Proceedings of the 39th annual Design Automation Conference
Low-complexity reorder buffer architecture

ICS '02 Proceedings of the 16th international conference on Supercomputing
Profile-guided post-link stride prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Bloom filtering cache misses for accurate data speculation and prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Multithreading decoupled architectures for complexity-effective general purpose computing

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Direct load: dependence-linked dataflow resolution of load address and cache coordinate

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Measuring Experimental Error in Microprocessor Simulation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Fine-grain CAM-tag cache resizing using miss tags

Proceedings of the 2002 international symposium on Low power electronics and design
Neural methods for dynamic branch prediction

ACM Transactions on Computer Systems (TOCS)
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Control-Flow Speculation through Value Prediction

IEEE Transactions on Computers
Access Control Mechanisms in a Distributed, Persistent Memory System

IEEE Transactions on Parallel and Distributed Systems
Typing the ISA to cluster the processor

Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
An Adaptive Issue Queue for Reduced Power at High Performance

PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Typing the ISA to Cluster the Processor

PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
High Performance and Energy Efficient Serial Prefetch Architecture

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Integrated I-cache Way Predictor and Branch Target Buffer to Reduce Energy Consumption

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A Programmable Memory Hierarchy for Prefetching Linked Data Structures

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Speculative Clustered Caches for Clustered Processors

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Improving Conditional Branch Prediction on Speculative Multithreading Architectures

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Reuse Distance-Based Cache Hint Selection

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Applying Machine Learning for Ensemble Branch Predictors

IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Reordering Memory Bus Transactions for Reduced Power Consumption

LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Cached Two-Level Adaptive Branch Predictors with Multiple Stages

ARCS '02 Proceedings of the International Conference on Architecture of Computing Systems: Trends in Network and Pervasive Computing
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Three extensions to register integration

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Three-dimensional memory vectorization for high bandwidth media memory systems

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Managing static leakage energy in microprocessor functional units

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Exploiting data-width locality to increase superscalar execution bandwidth

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic binary translation for accumulator-oriented architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Phi-Predication for light-weight if-conversion

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
On the Design of a High-Performance Adaptive Router for CC-NUMA Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Predicate prediction for efficient out-of-order execution

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Power-efficient issue queue design

Power aware computing
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Reconsidering Complex Branch Predictors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Banked multiported register files for high-frequency superscalar microprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Effective ahead pipelining of instruction block address generation

Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors

Proceedings of the 30th annual international symposium on Computer architecture
Overcoming the limitations of conventional vector processors

Proceedings of the 30th annual international symposium on Computer architecture
Reducing reorder buffer complexity through selective operand caching

Proceedings of the 2003 international symposium on Low power electronics and design
Reducing data cache energy consumption via cached load/store queue

Proceedings of the 2003 international symposium on Low power electronics and design
On load latency in low-power caches

Proceedings of the 2003 international symposium on Low power electronics and design
Microprocessor pipeline energy analysis

Proceedings of the 2003 international symposium on Low power electronics and design
Address-free memory access based on program syntax correlation of loads and stores

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
An Experimental Study of Polylogarithmic, Fully Dynamic, Connectivity Algorithms

Journal of Experimental Algorithmics (JEA)
Beating in-order stalls with "flea-flicker" two-pass pipelining

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Fast Path-Based Neural Branch Prediction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
TLC: Transmission Line Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Two-level branch prediction using neural networks

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
Reducing register pressure through LAER algorithm

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP

ACM Transactions on Architecture and Code Optimization (TACO)
Complexity-Effective Reorder Buffer Designs for Superscalar Processors

IEEE Transactions on Computers
Isolating Short-Lived Operands for Energy Reduction

IEEE Transactions on Computers
A general decomposition strategy for verifying register renaming

Proceedings of the 41st annual Design Automation Conference
Energy Efficient Comparators for Superscalar Datapaths

IEEE Transactions on Computers
Scaling the issue window with look-ahead latency prediction

Proceedings of the 18th annual international conference on Supercomputing
Back-end assignment schemes for clustered multithreaded processors

Proceedings of the 18th annual international conference on Supercomputing
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures

Proceedings of the 18th annual international conference on Supercomputing
Wire Delay is Not a Problem for SMT (In the Near Future)

Proceedings of the 31st annual international symposium on Computer architecture
SMTp: An Architecture for Next-generation Scalable Multi-threading

Proceedings of the 31st annual international symposium on Computer architecture
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
Use-Based Register Caching with Decoupled Indexing

Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A scalable, clustered SMT processor for digital signal processing

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Scalable selective re-execution for EDGE architectures

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Cache Refill/Access Decoupling for Vector Machines

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Using a serial cache for energy efficient instruction fetching

Journal of Systems Architecture: the EUROMICRO Journal
Tolerating memory latency through push prefetching for pointer-intensive applications

ACM Transactions on Architecture and Code Optimization (TACO)
Increasing Register File Immunity to Transient Errors

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Memory coherence activity prediction in commercial workloads

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Cache organizations for clustered microarchitectures

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Scalable cache memory design for large-scale SMT architectures

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Load elimination for low-power embedded processors

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
A Speculative Control Scheme for an Energy-Efficient Banked Register File

IEEE Transactions on Computers
Improved latency and accuracy for neural branch prediction

ACM Transactions on Computer Systems (TOCS)
Exploiting temporal locality in drowsy cache policies

Proceedings of the 2nd conference on Computing frontiers
Organization and implementation of the register-renaming mapper for out-of-order IBM POWER4 processors

IBM Journal of Research and Development - Electrochemical technology in microelectronics
Code placement for improving dynamic branch prediction accuracy

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Energy-effcient physically tagged caches for embedded processors with virtual memory

Proceedings of the 42nd annual Design Automation Conference
Analysis of the O-GEometric History Length Branch Predictor

Proceedings of the 32nd annual international symposium on Computer Architecture
Snug set-associative caches: reducing leakage power while improving performance

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Reducing latencies of pipelined cache accesses through set prediction

Proceedings of the 19th annual international conference on Supercomputing
An asymmetric clustered processor based on value content

Proceedings of the 19th annual international conference on Supercomputing
Exploring the limits of leakage power reduction in caches

ACM Transactions on Architecture and Code Optimization (TACO)
Restrictive Compression Techniques to Increase Level 1 Cache Capacity

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism.

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Reducing the Energy of Speculative Instruction Schedulers

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Scalable Store-Load Forwarding via Store Queue Index Prediction

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining

IEEE Transactions on Computers
Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
Dynamic thread assignment on heterogeneous multiprocessor architectures

Proceedings of the 3rd conference on Computing frontiers
Speculative early register release

Proceedings of the 3rd conference on Computing frontiers
Vulnerability analysis of L2 cache elements to single event upsets

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Intelligent memory manager: reducing cache pollution due to memory management functions

Journal of Systems Architecture: the EUROMICRO Journal
Nooks: an architecture for reliable device drivers

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

Proceedings of the 33rd annual international symposium on Computer Architecture
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures

IEEE Transactions on Computers
Decomposing the load-store queue by function for power reduction and scalability

IBM Journal of Research and Development
Using the first-level caches as filters to reduce the pollution caused by speculative memory references

International Journal of Parallel Programming
Microarchitecture of the Godson-2 processor

Journal of Computer Science and Technology
SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Overlapping dependent loads with addressless preload

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Long-latency branches: how much do they matter?

ACM SIGARCH Computer Architecture News
Throttling-Based Resource Management in High Performance Multithreaded Architectures

IEEE Transactions on Computers
Early Register Deallocation Mechanisms Using Checkpointed Register Files

IEEE Transactions on Computers
A case for a complexity-effective, width-partitioned microarchitecture

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing cache traffic and energy with macro data load

Proceedings of the 2006 international symposium on Low power electronics and design
Register file caching for energy efficiency

Proceedings of the 2006 international symposium on Low power electronics and design
Design space exploration for multicore architectures: a power/performance/thermal view

Proceedings of the 20th annual international conference on Supercomputing
Reducing Data Cache Susceptibility to Soft Errors

IEEE Transactions on Dependable and Secure Computing
Dataflow Predication

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
NoSQ: Store-Load Communication without a Store Queue

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

ACM Transactions on Architecture and Code Optimization (TACO)
Register port complexity reduction in wide-issue processors with selective instruction execution

Microprocessors & Microsystems
A comparison of two policies for issuing instructions speculatively

Journal of Systems Architecture: the EUROMICRO Journal
Compacting register file via 2-level renaming and bit-partitioning

Microprocessors & Microsystems
A predictive decode filter cache for reducing power consumption in embedded processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Microarchitecture parameter selection to optimize system performance under process variation

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Reducing I-cache energy of multimedia applications through low cost tag comparison elimination

Journal of Embedded Computing - Cache exploitation in embedded systems
Reducing non-deterministic loads in low-power caches via early cache set resolution

Microprocessors & Microsystems
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
ReCycle:: pipeline adaptation to tolerate process variation

Proceedings of the 34th annual international symposium on Computer architecture
VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization

Proceedings of the 34th annual international symposium on Computer architecture
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors

Proceedings of the International Symposium on Code Generation and Optimization
Implementation and Evaluation of a Dynamically Routed Processor Operand Network

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
SATSim: a superscalar architecture trace simulator using interactive animation

WCAE '00 Proceedings of the 2000 workshop on Computer architecture education
Superscalar out-of-order demystified in four instructions

WCAE '03 Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture
The SimCore/Alpha Functional Simulator

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Design principles for a virtual multiprocessor

Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
The AMD Opteron Northbridge Architecture

IEEE Micro
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality

IEEE Transactions on Computers
Dynamic tag reduction for low-power caches in embedded systems with virtual memory

International Journal of Parallel Programming
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance

IEEE Transactions on Computers
Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

IEEE Transactions on Computers
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies

IEEE Transactions on Computers
Improving the performance of object-oriented languages with dynamic predication of indirect jumps

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accurate branch prediction for short threads

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accurate critical path prediction via random trace construction

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Dependability, power, and performance trade-off on a multicore processor

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Trends toward on-chip networked microsystems

International Journal of High Performance Computing and Networking
Hardware support for early register release

International Journal of High Performance Computing and Networking
High-performance and low-power VLIW cores for numerical computations

International Journal of High Performance Computing and Networking
Variable latency caches for nanoscale processor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Low-power clock distribution in a multilayer core 3d microprocessor

Proceedings of the 18th ACM Great Lakes symposium on VLSI
Reducing the impact of intra-core process variability with criticality-based resource allocation and prefetching

Proceedings of the 5th conference on Computing frontiers
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture

Proceedings of the 5th conference on Computing frontiers
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
A distributed, simultaneously multi-threaded (SMT) processor with clustered scheduling windows for scalable DSP performance

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A physical level study and optimization of CAM-based checkpointed register alias table

Proceedings of the 13th international symposium on Low power electronics and design
Power-efficient clustering via incomplete bypassing

Proceedings of the 13th international symposium on Low power electronics and design
A low-complexity microprocessor design with speculative pre-execution

Journal of Systems Architecture: the EUROMICRO Journal
Speculative return address stack management revisited

ACM Transactions on Architecture and Code Optimization (TACO)
Cross-layer customization for rapid and low-cost task preemption in multitasked embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
HeDGE: Hybrid Dataflow Graph Execution in the Issue Logic

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Shapeshifter: Dynamically changing pipeline width and speed to address process variations

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
(When) Will CMPs Hit the Power Wall?

Euro-Par 2008 Workshops - Parallel Processing
A complexity-effective microprocessor design with decoupled dispatch queues and prefetching

Parallel Computing
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Implementing a 1GHz four-issue out-of-order execution microprocessor in a standard cell ASIC methodology

Journal of Computer Science and Technology
Techniques for leakage energy reduction in deep submicrometer cache memories

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Temperature-constrained power control for chip multiprocessors with online model estimation

Proceedings of the 36th annual international symposium on Computer architecture
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

Proceedings of the 36th annual international symposium on Computer architecture
Checkpoint allocation and release

ACM Transactions on Architecture and Code Optimization (TACO)
Design and optimization of the store vectors memory dependence predictor

ACM Transactions on Architecture and Code Optimization (TACO)
Access region cache with register guided memory reference partitioning

Journal of Systems Architecture: the EUROMICRO Journal
POWER4 system microarchitecture

IBM Journal of Research and Development
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
On reducing load/store latencies of cache accesses

Journal of Systems Architecture: the EUROMICRO Journal
Saturating counter design for meta predictor in hybrid branch prediction

CSECS'09 Proceedings of the 8th WSEAS International Conference on Circuits, systems, electronics, control & signal processing
Evaluating the performance of space plasma simulations using FPGA's

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Dynamic register-renaming scheme for reducing power-density and temperature

Proceedings of the 2010 ACM Symposium on Applied Computing
Decoupled state-execute architecture

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Using a way cache to improve performance of set-associative caches

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
A power-aware hybrid RAM-CAM renaming mechanism for fast recovery

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Necromancer: enhancing system throughput by animating dead cores

Proceedings of the 37th annual international symposium on Computer architecture
On the latency and energy of checkpointed superscalar register alias tables

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A novel meta predictor design for hybrid branch prediction

WSEAS Transactions on Computers
Register-relocation: a thermal-aware renaming method for reducing temperature of a register file

ACM SIGAPP Applied Computing Review
Exploiting narrow-width values for thermal-aware register file designs

Proceedings of the Conference on Design, Automation and Test in Europe
Compatible phase co-scheduling on a CMP of multi-threaded processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Register Cache System Not for Latency Reduction Purpose

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
STEM: Spatiotemporal Management of Capacity for Intra-core Last Level Caches

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Wake-up logic optimizations through selective match and wakeup range limitation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-efficient hardware data prefetching

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CPPC: correctable parity protected cache

Proceedings of the 38th annual international symposium on Computer architecture
Thermal-aware floorplan schemes for reliable 3D multi-core processors

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part II
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

The Journal of Supercomputing
CoreSymphony: an efficient reconfigurable multi-core architecture

ACM SIGARCH Computer Architecture News
Using branch prediction information for near-optimal i-cache leakage

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
An architectural leakage power reduction method for instruction cache in ultra deep submicron microprocessors

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
2L-MuRR: a compact register renaming scheme for SMT processors

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Low power microprocessor design for embedded systems

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
CRAM: coded registers for amplified multiporting

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Achieving reliable system performance by fast recovery of branch miss prediction

Journal of Network and Computer Applications
A memory bandwidth effective cache store miss policy

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
A scalable, multi-thread, multi-issue array processor architecture for DSP applications based on extended tomasulo scheme

SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
RELOCATE: register file local access pattern redistribution mechanism for power and thermal management in out-of-order embedded processor

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Complexity-Effective rename table design for rapid speculation recovery

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Exploiting inactive rename slots for detecting soft errors

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Designing for dark silicon: a methodological perspective on energy efficient systems

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Adaptive dynamic frequency scaling for thermal-aware 3d multi-core processors

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
Active memory controller

The Journal of Supercomputing
Thermal-aware task scheduling in 3D chip multiprocessor with real-time constrained workloads

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on embedded systems for interactive multimedia services (ES-IMS)
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

ACM Transactions on Architecture and Code Optimization (TACO)
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
AVICA: an access-time variation insensitive L1 cache architecture

Proceedings of the Conference on Design, Automation and Test in Europe
Exploiting replicated checkpoints for soft error detection and correction

Proceedings of the Conference on Design, Automation and Test in Europe
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Statistical thermal modeling and optimization considering leakage power variations

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Multi-core systems modeling for formal verification of parallel algorithms

ACM SIGOPS Operating Systems Review
Low Cost Concurrent Error Detection Strategy for the Control Logic of High Performance Microprocessors and Its Application to the Instruction Decoder

Journal of Electronic Testing: Theory and Applications
Modular multi-ported SRAM-based memories

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.04

Visualization

Abstract

The third generation Alpha microprocessor from Compaq Computer Corporation (formerly Digital Equipment) is the 21264. This microprocessor can execute 2.0-2.4 billion instructions per second with a 500-600 MHz cycle time in a 0.35 um CMOS process, resulting in the industry-leading performance of 30+ SPECint95 and 58+ SPECfp95 in early system offerings. This paper focuses on the overall 21264 architecture, as well as many of the particular architectural techniques used to achieve these performance levels. These include many forms of out-of-order and speculative execution as well as a high-performance memory system.