The MIPS R10000 Superscalar Microprocessor

Authors:
Kenneth C. Yeager
Affiliations:
-
Venue:
IEEE Micro
Year:
1996

Citing 0
Cited 254

Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Hardware fault containment in scalable shared-memory multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
Designing high bandwidth on-chip caches

Proceedings of the 24th annual international symposium on Computer architecture
Memory-system design considerations for dynamically-scheduled processors

Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Prediction caches for superscalar processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Out-of-order vector architectures

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Initial results on the performance and cost of vector microprocessors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A performance study of out-of-order vector architectures and short registers

ICS '98 Proceedings of the 12th international conference on Supercomputing
Informing memory operations: memory performance feedback mechanisms and their applications

ACM Transactions on Computer Systems (TOCS)
Dynamic history-length fitting: a third level of adaptivity for branch prediction

Proceedings of the 25th annual international symposium on Computer architecture
Options for dynamic address translation in COMAs

Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture

Proceedings of the 25th annual international symposium on Computer architecture
Development and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap model

Proceedings of the 1st international workshop on Software and performance
Simple vector microprocessors for multimedia applications

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical study of decentralized ILP execution models

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Fast out-of-order processor simulation using memoization

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications

IEEE Transactions on Computers - Special issue on cache memory and related problems
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Decoupling local variable accesses in a wide-issue superscalar processor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Commit-reconcile & fences (CRF): a new memory model for architects and compiler writers

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Way-predicting set-associative cache for high performance and low energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Concurrent Event Handling through Multithreading

IEEE Transactions on Computers
Access region locality for high-bandwidth processor memory system design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The use of multithreading for exception handling

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

International Journal of Parallel Programming
Using complete system simulation to characterize SPECjvm98 benchmarks

Proceedings of the 14th international conference on Supercomputing
Understanding Why Correlation Profiling Improves the Predictability of Data Cache Misses in Nonnumeric Applications

IEEE Transactions on Computers
Efficient performance prediction for modern microprocessors

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors

IEEE Transactions on Computers
FLASH vs. (simulated) FLASH: closing the simulation loop

ACM SIGPLAN Notices
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
Silent stores for free

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler support for effective instruction prefetching: a cooperative approach

ACM Transactions on Computer Systems (TOCS)
A circuit level implementation of an adaptive issue queue for power-aware microprocessors

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Inherently Lower-Power High-Performance Superscalar Architectures

IEEE Transactions on Computers
High Bandwidth On-Chip Cache Design

IEEE Transactions on Computers
Improving index performance through prefetching

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Improving Gang Scheduling through job performance analysis and malleability

ICS '01 Proceedings of the 15th international conference on Supercomputing
Integrating superscalar processor components to implement register caching

ICS '01 Proceedings of the 15th international conference on Supercomputing
FLASH vs. (Simulated) FLASH: closing the simulation loop

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamically allocating processor resources between nearby and distant ILP

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
L1 data cache decomposition for energy efficiency

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Energy reduction in queues and stacks by adaptive bitwidth compression

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
A High-Bandwidth Memory Pipeline for Wide Issue Processors

IEEE Transactions on Computers
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
Improving Latency Tolerance of Multithreading through Decoupling

IEEE Transactions on Computers
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
SMT Layout Overhead and Scalability

IEEE Transactions on Parallel and Distributed Systems
Performance evaluation of the SGI Origin2000: a memory-centric characterization of LANL ASCI applications

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Full-system timing-first simulation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Characterizing operating system activity in SPECjvm98 Benchmarks

Workload characterization of emerging computer applications
Bloom filtering cache misses for accurate data speculation and prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Fractal prefetching B+-Trees: optimizing both cache and disk performance

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Energy-efficient hybrid wakeup logic

Proceedings of the 2002 international symposium on Low power electronics and design
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Integrating non-blocking synchronisation in parallel applications: performance advantages and methodologies

WOSP '02 Proceedings of the 3rd international workshop on Software and performance
A Simulation Study of Decoupled Vector Architectures

The Journal of Supercomputing
Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Understanding and improving operating system effects in control flow prediction

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dynamic dead-instruction detection and elimination

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Single-Chip Multiprocessor

Computer
Compilers for Instruction-Level Parallelism

Computer
Using Simple Tools to Evaluate Complex Architectural Trade-offs

IEEE Micro
Optimizing Main-Memory Join on Modern Hardware

IEEE Transactions on Knowledge and Data Engineering
Access Control Mechanisms in a Distributed, Persistent Memory System

IEEE Transactions on Parallel and Distributed Systems
Improving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Selective Register Renaming: A Compiler-Driven Approach to Dynamic Register Renaming

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Run-Time Support to Register Allocation for Loop Parallelization of Image Processing Programs

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Conflict-Free Access to Multiple Single-Ported Register Files

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Hierarchical Interconnects for On-Chip Clustering

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An Adaptive Issue Queue for Reduced Power at High Performance

PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Efficient Interconnects for Clustered Microarchitectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Speculative Sequential Consistency with Little Custom Storage

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Statistical-Empirical Hybrid Approach to Hierarchical Memory Analysis

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Considerations for Scalable CAE on the SGI ccNUMA Architecture

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Execution Latency Reduction via Variable Latency Pipeline and Instruction Reuse

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Three extensions to register integration

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Enhancing memory level parallelism via recovery-free value prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Power-efficient issue queue design

Power aware computing
The Hierarchical Multi-Bank DRAM: A High-Performance Architecture for Memory Integrated with Processors

ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Data Dependence Tracking and its Application to Branch Prediction

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Instruction-level parallel processors-dynamic and static scheduling tradeoffs

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Simultaneous Multithreading-Based Routers

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Visualizing Application Behavior on Superscalar Processors

INFOVIS '99 Proceedings of the 1999 IEEE Symposium on Information Visualization
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
On load latency in low-power caches

Proceedings of the 2003 international symposium on Low power electronics and design
Low cost instruction cache designs for tag comparison elimination

Proceedings of the 2003 international symposium on Low power electronics and design
Checkpointing alternatives for high performance, power-aware processors

Proceedings of the 2003 international symposium on Low power electronics and design
Branch prediction on demand: an energy-efficient solution

Proceedings of the 2003 international symposium on Low power electronics and design
Analyzing the Individual/Combined Effects of Speculative and Guarded Execution on a Superscalar Architecture

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Multiple-path execution for chip multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Analysis of the impact of different methods for division/square root computation in the performance of a superscalar microprocessor

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
Scheduling Reusable Instructions for Power Reduction

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Leakage Energy Reduction in Register Renaming

ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
Reducing register pressure through LAER algorithm

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Tolerating Late Memory Traps in Dynamically Scheduled Processors

IEEE Transactions on Computers
A general decomposition strategy for verifying register renaming

Proceedings of the 41st annual Design Automation Conference
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
SMTp: An Architecture for Next-generation Scalable Multi-threading

Proceedings of the 31st annual international symposium on Computer architecture
Physical Register Inlining

Proceedings of the 31st annual international symposium on Computer architecture
Impact of technology scaling on energy aware execution cache-based microarchitectures

Proceedings of the 2004 international symposium on Low power electronics and design
Mixed-clock issue queue design for energy aware, high-performance cores

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
How accurate should early design stage power/performance tools be? A case study with statistical simulation

Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Area and System Clock Effects on SMT/CMP Throughput

IEEE Transactions on Computers
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures

IEEE Transactions on Parallel and Distributed Systems
An analysis of a resource efficient checkpoint architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Execution cache-based microarchitecture power-efficient superscalar processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines

Proceedings of the 32nd annual international symposium on Computer Architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Dynamic Verification of Sequential Consistency

Proceedings of the 32nd annual international symposium on Computer Architecture
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction

IEEE Transactions on Computers
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Optimistic intra-transaction parallelism on chip multiprocessors

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Fast branch misprediction recovery in out-of-order superscalar processors

Proceedings of the 19th annual international conference on Supercomputing
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Memory State Compressors for Giga-Scale Checkpoint/Restore

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor

IEEE Transactions on Computers
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors

IEEE Transactions on Parallel and Distributed Systems
Error-tolerance memory Microarchitecture via Dynamic Multithreading

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A New Pointer-based Instruction Queue Design and Its Power-Performance Evaluation

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency

IEEE Transactions on Computers
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
On the correctness of program execution when cache coherence is maintained locally at data-sharing boundaries in distributed shared memory multiprocessors

International Journal of Parallel Programming
An experimental evaluation of the HP V-class and SGI origin 2000 multiprocessors using microbenchmarks and scientific applications

International Journal of Parallel Programming
Speculative early register release

Proceedings of the 3rd conference on Computing frontiers
In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs)

IEEE Transactions on Computers
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Proceedings of the 33rd annual international symposium on Computer Architecture
A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Microarchitecture of the Godson-2 processor

Journal of Computer Science and Technology
SEED: scalable, efficient enforcement of dependences

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files

IEEE Transactions on Computers
A regulated transitive reduction (RTR) for longer memory race recording

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
High-level power analysis for multi-core chips

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
BranchTap: improving performance with very few checkpoints through adaptive speculation control

Proceedings of the 20th annual international conference on Supercomputing
OS-Aware Branch Prediction: Improving Microprocessor Control Flow Prediction for Operating Systems

IEEE Transactions on Computers
I-cache multi-banking and vertical interleaving

Proceedings of the 17th ACM Great Lakes symposium on VLSI
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

Microprocessors & Microsystems
Reducing I-cache energy of multimedia applications through low cost tag comparison elimination

Journal of Embedded Computing - Cache exploitation in embedded systems
Reducing non-deterministic loads in low-power caches via early cache set resolution

Microprocessors & Microsystems
Speculative trivialization point advancing in high-performance processors

Journal of Systems Architecture: the EUROMICRO Journal
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Performance-driven processor allocation

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Algorithm 867: QUADLOG—a package of routines for generating Gauss-related quadrature for two classes of logarithmic weight functions

ACM Transactions on Mathematical Software (TOMS)
An enhanced DLX-based superscalar system simulator

WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
Evaluating the performance of dynamic branch prediction schemes with BPSim

WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
SATSim: a superscalar architecture trace simulator using interactive animation

WCAE '00 Proceedings of the 2000 workshop on Computer architecture education
Superscalar out-of-order demystified in four instructions

WCAE '03 Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture
PSATSim: an interactive graphical superscalar architecture simulator for power and performance analysis

WCAE '06 Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture
Power-aware operand delivery

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
On the latency, energy and area of checkpointed, superscalar register alias tables

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality

IEEE Transactions on Computers
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

IEEE Transactions on Computers
Branch-on-random

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Hardware support for early register release

International Journal of High Performance Computing and Networking
Speeding-up multiprocessors running DBMS workloads through coherence protocols

International Journal of High Performance Computing and Networking
Compiler and hardware support for reducing the synchronization of speculative threads

ACM Transactions on Architecture and Code Optimization (TACO)
High performance set associative translation lookaside buffers for low power microprocessors

Integration, the VLSI Journal
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems
Achieving Out-of-Order Performance with Almost In-Order Complexity

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A Two-Level Load/Store Queue Based on Execution Locality

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A physical level study and optimization of CAM-based checkpointed register alias table

Proceedings of the 13th international symposium on Low power electronics and design
Power-efficient clustering via incomplete bypassing

Proceedings of the 13th international symposium on Low power electronics and design
Transparent reconfigurable acceleration for heterogeneous embedded applications

Proceedings of the conference on Design, automation and test in Europe
Improving support for locality and fine-grain sharing in chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Zero loads: canceling load requests by tracking zero values

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Improving error tolerance for multithreaded register files

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A distributed processor state management architecture for large-window processors

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Dynamically Adapted Low Power ASIPs

ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
Making effective decisions in computer architects' real-world: lessons and experiences with Godson-2 processor designs

Journal of Computer Science and Technology
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II
A compiler optimization to reduce soft errors in register files

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Implementing a 1GHz four-issue out-of-order execution microprocessor in a standard cell ASIC methodology

Journal of Computer Science and Technology
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Checkpoint allocation and release

ACM Transactions on Architecture and Code Optimization (TACO)
An energy-efficient checkpointing mechanism for out of order commit processor

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Folding active list for high performance and low power

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Exploiting execution locality with a decoupled Kilo-instruction processor

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Decoupled state-execute architecture

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Using a way cache to improve performance of set-associative caches

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Turbo-ROB: a low cost checkpoint/restore accelerator

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A power-aware hybrid RAM-CAM renaming mechanism for fast recovery

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture
An intra-chip free-space optical interconnect

Proceedings of the 37th annual international symposium on Computer architecture
On the latency and energy of checkpointed superscalar register alias tables

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A compiler-microarchitecture hybrid approach to soft error reduction for register files

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Characterization and exploitation of narrow-width loads: the narrow-width cache approach

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Wake-up logic optimizations through selective match and wakeup range limitation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On the exploitation of narrow-width values for improving register file reliability

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Performance evaluation of superscalar processor with multi-bank register file using SPEC2000

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Towards an adaptable multiple-ISA reconfigurable processor

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Boosting parallel applications performance on applying DIM technique in a multiprocessing environment

International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)
Bridge floating-point fused multiply-add design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A framework for correction of multi-bit soft errors in L2 caches based on redundancy

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
A unified approach to eliminate memory accesses early

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
CoreSymphony: an efficient reconfigurable multi-core architecture

ACM SIGARCH Computer Architecture News
Efficient sequential consistency via conflict ordering

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Achieving reliable system performance by fast recovery of branch miss prediction

Journal of Network and Computer Applications
Speculative issue logic

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
A memory bandwidth effective cache store miss policy

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
An optimized front-end physical register file with banking and writeback filtering

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
RELOCATE: register file local access pattern redistribution mechanism for power and thermal management in out-of-order embedded processor

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Complexity-Effective rename table design for rapid speculation recovery

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Static analysis and compiler design for idempotent processing

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Reducing L1 caches power by exploiting software semantics

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
iGPU: exception support and speculative execution on GPUs

Proceedings of the 39th Annual International Symposium on Computer Architecture
Accurately modeling superscalar processor performance with reduced trace

Journal of Parallel and Distributed Computing
Towards a multiple-ISA embedded system

Journal of Systems Architecture: the EUROMICRO Journal
Virtual register renaming

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Exploiting replicated checkpoints for soft error detection and correction

Proceedings of the Conference on Design, Automation and Test in Europe
Software-based register file vulnerability reduction for embedded processors

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10

Quantified Score

Hi-index	0.06

Visualization

Abstract

The Mips R10000 is a dynamic superscalar microprocessor that implements the 64-bit Mips-4 Instruction Set Architecture. It fetches and decodes four instructions per cycle and dynamically issues them to five fully pipelined low-latency execution units. Instructions can be fetched and executed speculatively beyond branches. Instructions graduate in order upon completion. Although instructions execute out of order, the processor still provides sequential memory consistency and precise exception handling.The R10000 is designed for high performance, even in large real-world applications which have poor memory locality. With speculative execution, it calculates memory addresses and initiates cache refills early. Its hierarchical nonblocking memory system helps hide memory latency with two levels of set-associative, write-back caches.