The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Circuit Implementation of a 600MHz Superscalar RISC Microprocessor
ICCD '98 Proceedings of the International Conference on Computer Design
Delaying physical register allocation through virtual-physical registers
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The use of multithreading for exception handling
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning
International Journal of Parallel Programming
Performance analysis of the Alpha 21264-based Compaq ES40 system
Proceedings of the 27th annual international symposium on Computer architecture
Circuits for wide-window superscalar processors
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Optimization of high-performance superscalar architectures for energy efficiency
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Architecture and design of AlphaServer GS320
ACM SIGPLAN Notices
The impact of delay on the design of branch predictors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Register integration: a simple and efficient implementation of squash reuse
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Measuring experimental error in microprocessor simulation
SSR '01 Proceedings of the 2001 symposium on Software reusability: putting software reuse in context
Energy reduction in queues and stacks by adaptive bitwidth compression
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Workload characterization of emerging computer applications
A flexible accelerator for layer 7 networking applications
Proceedings of the 39th annual Design Automation Conference
Low-complexity reorder buffer architecture
ICS '02 Proceedings of the 16th international conference on Supercomputing
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Bloom filtering cache misses for accurate data speculation and prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Multithreading decoupled architectures for complexity-effective general purpose computing
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Direct load: dependence-linked dataflow resolution of load address and cache coordinate
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Measuring Experimental Error in Microprocessor Simulation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Fine-grain CAM-tag cache resizing using miss tags
Proceedings of the 2002 international symposium on Low power electronics and design
Neural methods for dynamic branch prediction
ACM Transactions on Computer Systems (TOCS)
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Control-Flow Speculation through Value Prediction
IEEE Transactions on Computers
Access Control Mechanisms in a Distributed, Persistent Memory System
IEEE Transactions on Parallel and Distributed Systems
Typing the ISA to cluster the processor
Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
An Adaptive Issue Queue for Reduced Power at High Performance
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Typing the ISA to Cluster the Processor
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
High Performance and Energy Efficient Serial Prefetch Architecture
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Integrated I-cache Way Predictor and Branch Target Buffer to Reduce Energy Consumption
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A Programmable Memory Hierarchy for Prefetching Linked Data Structures
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Speculative Clustered Caches for Clustered Processors
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Improving Conditional Branch Prediction on Speculative Multithreading Architectures
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Register File Architecture and Compilation Scheme for Clustered ILP Processors
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Reuse Distance-Based Cache Hint Selection
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Applying Machine Learning for Ensemble Branch Predictors
IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Reordering Memory Bus Transactions for Reduced Power Consumption
LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Cached Two-Level Adaptive Branch Predictors with Multiple Stages
ARCS '02 Proceedings of the International Conference on Architecture of Computing Systems: Trends in Network and Pervasive Computing
Hierarchical Scheduling Windows
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Three extensions to register integration
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Three-dimensional memory vectorization for high bandwidth media memory systems
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Managing static leakage energy in microprocessor functional units
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Exploiting data-width locality to increase superscalar execution bandwidth
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Phi-Predication for light-weight if-conversion
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
On the Design of a High-Performance Adaptive Router for CC-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Predicate prediction for efficient out-of-order execution
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Power-efficient issue queue design
Power aware computing
Front-End Policies for Improved Issue Efficiency in SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Reconsidering Complex Branch Predictors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Effective ahead pipelining of instruction block address generation
Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
Overcoming the limitations of conventional vector processors
Proceedings of the 30th annual international symposium on Computer architecture
Reducing reorder buffer complexity through selective operand caching
Proceedings of the 2003 international symposium on Low power electronics and design
Reducing data cache energy consumption via cached load/store queue
Proceedings of the 2003 international symposium on Low power electronics and design
On load latency in low-power caches
Proceedings of the 2003 international symposium on Low power electronics and design
Microprocessor pipeline energy analysis
Proceedings of the 2003 international symposium on Low power electronics and design
Address-free memory access based on program syntax correlation of loads and stores
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
An Experimental Study of Polylogarithmic, Fully Dynamic, Connectivity Algorithms
Journal of Experimental Algorithmics (JEA)
Beating in-order stalls with "flea-flicker" two-pass pipelining
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Fast Path-Based Neural Branch Prediction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Two-level branch prediction using neural networks
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
Reducing register pressure through LAER algorithm
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
ACM Transactions on Architecture and Code Optimization (TACO)
Complexity-Effective Reorder Buffer Designs for Superscalar Processors
IEEE Transactions on Computers
Isolating Short-Lived Operands for Energy Reduction
IEEE Transactions on Computers
A general decomposition strategy for verifying register renaming
Proceedings of the 41st annual Design Automation Conference
Energy Efficient Comparators for Superscalar Datapaths
IEEE Transactions on Computers
Scaling the issue window with look-ahead latency prediction
Proceedings of the 18th annual international conference on Supercomputing
Back-end assignment schemes for clustered multithreaded processors
Proceedings of the 18th annual international conference on Supercomputing
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures
Proceedings of the 18th annual international conference on Supercomputing
Wire Delay is Not a Problem for SMT (In the Near Future)
Proceedings of the 31st annual international symposium on Computer architecture
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
Use-Based Register Caching with Decoupled Indexing
Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A scalable, clustered SMT processor for digital signal processing
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Scalable selective re-execution for EDGE architectures
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Cache Refill/Access Decoupling for Vector Machines
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Using a serial cache for energy efficient instruction fetching
Journal of Systems Architecture: the EUROMICRO Journal
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
Increasing Register File Immunity to Transient Errors
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Memory coherence activity prediction in commercial workloads
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Cache organizations for clustered microarchitectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Scalable cache memory design for large-scale SMT architectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Load elimination for low-power embedded processors
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
A Speculative Control Scheme for an Energy-Efficient Banked Register File
IEEE Transactions on Computers
Improved latency and accuracy for neural branch prediction
ACM Transactions on Computer Systems (TOCS)
Exploiting temporal locality in drowsy cache policies
Proceedings of the 2nd conference on Computing frontiers
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Code placement for improving dynamic branch prediction accuracy
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Energy-effcient physically tagged caches for embedded processors with virtual memory
Proceedings of the 42nd annual Design Automation Conference
Analysis of the O-GEometric History Length Branch Predictor
Proceedings of the 32nd annual international symposium on Computer Architecture
Snug set-associative caches: reducing leakage power while improving performance
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Reducing latencies of pipelined cache accesses through set prediction
Proceedings of the 19th annual international conference on Supercomputing
An asymmetric clustered processor based on value content
Proceedings of the 19th annual international conference on Supercomputing
Exploring the limits of leakage power reduction in caches
ACM Transactions on Architecture and Code Optimization (TACO)
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism.
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Reducing the Energy of Speculative Instruction Schedulers
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A Criticality Analysis of Clustering in Superscalar Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Scalable Store-Load Forwarding via Store Queue Index Prediction
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining
IEEE Transactions on Computers
Journal of Systems Architecture: the EUROMICRO Journal
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
Compiling for EDGE Architectures
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
Speculative early register release
Proceedings of the 3rd conference on Computing frontiers
Vulnerability analysis of L2 cache elements to single event upsets
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Intelligent memory manager: reducing cache pollution due to memory management functions
Journal of Systems Architecture: the EUROMICRO Journal
Nooks: an architecture for reliable device drivers
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Proceedings of the 33rd annual international symposium on Computer Architecture
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures
IEEE Transactions on Computers
Decomposing the load-store queue by function for power reduction and scalability
IBM Journal of Research and Development
International Journal of Parallel Programming
Microarchitecture of the Godson-2 processor
Journal of Computer Science and Technology
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Overlapping dependent loads with addressless preload
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Long-latency branches: how much do they matter?
ACM SIGARCH Computer Architecture News
Throttling-Based Resource Management in High Performance Multithreaded Architectures
IEEE Transactions on Computers
Early Register Deallocation Mechanisms Using Checkpointed Register Files
IEEE Transactions on Computers
A case for a complexity-effective, width-partitioned microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing cache traffic and energy with macro data load
Proceedings of the 2006 international symposium on Low power electronics and design
Register file caching for energy efficiency
Proceedings of the 2006 international symposium on Low power electronics and design
Design space exploration for multicore architectures: a power/performance/thermal view
Proceedings of the 20th annual international conference on Supercomputing
Reducing Data Cache Susceptibility to Soft Errors
IEEE Transactions on Dependable and Secure Computing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
NoSQ: Store-Load Communication without a Store Queue
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Register port complexity reduction in wide-issue processors with selective instruction execution
Microprocessors & Microsystems
A comparison of two policies for issuing instructions speculatively
Journal of Systems Architecture: the EUROMICRO Journal
Compacting register file via 2-level renaming and bit-partitioning
Microprocessors & Microsystems
A predictive decode filter cache for reducing power consumption in embedded processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Microarchitecture parameter selection to optimize system performance under process variation
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Reducing I-cache energy of multimedia applications through low cost tag comparison elimination
Journal of Embedded Computing - Cache exploitation in embedded systems
Reducing non-deterministic loads in low-power caches via early cache set resolution
Microprocessors & Microsystems
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
ReCycle:: pipeline adaptation to tolerate process variation
Proceedings of the 34th annual international symposium on Computer architecture
VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization
Proceedings of the 34th annual international symposium on Computer architecture
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors
Proceedings of the International Symposium on Code Generation and Optimization
Implementation and Evaluation of a Dynamically Routed Processor Operand Network
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
SATSim: a superscalar architecture trace simulator using interactive animation
WCAE '00 Proceedings of the 2000 workshop on Computer architecture education
Superscalar out-of-order demystified in four instructions
WCAE '03 Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture
The SimCore/Alpha Functional Simulator
WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Design principles for a virtual multiprocessor
Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
The AMD Opteron Northbridge Architecture
IEEE Micro
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality
IEEE Transactions on Computers
Dynamic tag reduction for low-power caches in embedded systems with virtual memory
International Journal of Parallel Programming
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance
IEEE Transactions on Computers
IEEE Transactions on Computers
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies
IEEE Transactions on Computers
Improving the performance of object-oriented languages with dynamic predication of indirect jumps
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accurate branch prediction for short threads
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accurate critical path prediction via random trace construction
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Dependability, power, and performance trade-off on a multicore processor
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
Hardware support for early register release
International Journal of High Performance Computing and Networking
High-performance and low-power VLIW cores for numerical computations
International Journal of High Performance Computing and Networking
Variable latency caches for nanoscale processor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Low-power clock distribution in a multilayer core 3d microprocessor
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Proceedings of the 5th conference on Computing frontiers
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture
Proceedings of the 5th conference on Computing frontiers
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A physical level study and optimization of CAM-based checkpointed register alias table
Proceedings of the 13th international symposium on Low power electronics and design
Power-efficient clustering via incomplete bypassing
Proceedings of the 13th international symposium on Low power electronics and design
A low-complexity microprocessor design with speculative pre-execution
Journal of Systems Architecture: the EUROMICRO Journal
Speculative return address stack management revisited
ACM Transactions on Architecture and Code Optimization (TACO)
Cross-layer customization for rapid and low-cost task preemption in multitasked embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
HeDGE: Hybrid Dataflow Graph Execution in the Issue Logic
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Shapeshifter: Dynamically changing pipeline width and speed to address process variations
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
(When) Will CMPs Hit the Power Wall?
Euro-Par 2008 Workshops - Parallel Processing
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Reexecution and Selective Reuse in Checkpoint Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Journal of Computer Science and Technology
Techniques for leakage energy reduction in deep submicrometer cache memories
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Temperature-constrained power control for chip multiprocessors with online model estimation
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 36th annual international symposium on Computer architecture
Checkpoint allocation and release
ACM Transactions on Architecture and Code Optimization (TACO)
Design and optimization of the store vectors memory dependence predictor
ACM Transactions on Architecture and Code Optimization (TACO)
Access region cache with register guided memory reference partitioning
Journal of Systems Architecture: the EUROMICRO Journal
POWER4 system microarchitecture
IBM Journal of Research and Development
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
On reducing load/store latencies of cache accesses
Journal of Systems Architecture: the EUROMICRO Journal
Saturating counter design for meta predictor in hybrid branch prediction
CSECS'09 Proceedings of the 8th WSEAS International Conference on Circuits, systems, electronics, control & signal processing
Evaluating the performance of space plasma simulations using FPGA's
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Dynamic register-renaming scheme for reducing power-density and temperature
Proceedings of the 2010 ACM Symposium on Applied Computing
Decoupled state-execute architecture
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Using a way cache to improve performance of set-associative caches
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
A power-aware hybrid RAM-CAM renaming mechanism for fast recovery
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Necromancer: enhancing system throughput by animating dead cores
Proceedings of the 37th annual international symposium on Computer architecture
On the latency and energy of checkpointed superscalar register alias tables
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A novel meta predictor design for hybrid branch prediction
WSEAS Transactions on Computers
Register-relocation: a thermal-aware renaming method for reducing temperature of a register file
ACM SIGAPP Applied Computing Review
Exploiting narrow-width values for thermal-aware register file designs
Proceedings of the Conference on Design, Automation and Test in Europe
Compatible phase co-scheduling on a CMP of multi-threaded processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Register Cache System Not for Latency Reduction Purpose
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
STEM: Spatiotemporal Management of Capacity for Intra-core Last Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Wake-up logic optimizations through selective match and wakeup range limitation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-efficient hardware data prefetching
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CPPC: correctable parity protected cache
Proceedings of the 38th annual international symposium on Computer architecture
Thermal-aware floorplan schemes for reliable 3D multi-core processors
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part II
The Journal of Supercomputing
CoreSymphony: an efficient reconfigurable multi-core architecture
ACM SIGARCH Computer Architecture News
Using branch prediction information for near-optimal i-cache leakage
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
2L-MuRR: a compact register renaming scheme for SMT processors
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Low power microprocessor design for embedded systems
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
CRAM: coded registers for amplified multiporting
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Achieving reliable system performance by fast recovery of branch miss prediction
Journal of Network and Computer Applications
A memory bandwidth effective cache store miss policy
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Complexity-Effective rename table design for rapid speculation recovery
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Exploiting inactive rename slots for detecting soft errors
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Designing for dark silicon: a methodological perspective on energy efficient systems
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Adaptive dynamic frequency scaling for thermal-aware 3d multi-core processors
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
The Journal of Supercomputing
Thermal-aware task scheduling in 3D chip multiprocessor with real-time constrained workloads
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on embedded systems for interactive multimedia services (ES-IMS)
ACM Transactions on Architecture and Code Optimization (TACO)
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
AVICA: an access-time variation insensitive L1 cache architecture
Proceedings of the Conference on Design, Automation and Test in Europe
Exploiting replicated checkpoints for soft error detection and correction
Proceedings of the Conference on Design, Automation and Test in Europe
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Statistical thermal modeling and optimization considering leakage power variations
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Multi-core systems modeling for formal verification of parallel algorithms
ACM SIGOPS Operating Systems Review
Journal of Electronic Testing: Theory and Applications
Modular multi-ported SRAM-based memories
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.04 |
The third generation Alpha microprocessor from Compaq Computer Corporation (formerly Digital Equipment) is the 21264. This microprocessor can execute 2.0-2.4 billion instructions per second with a 500-600 MHz cycle time in a 0.35 um CMOS process, resulting in the industry-leading performance of 30+ SPECint95 and 58+ SPECfp95 in early system offerings. This paper focuses on the overall 21264 architecture, as well as many of the particular architectural techniques used to achieve these performance levels. These include many forms of out-of-order and speculative execution as well as a high-performance memory system.