Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Hardware fault containment in scalable shared-memory multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Designing high bandwidth on-chip caches
Proceedings of the 24th annual international symposium on Computer architecture
Memory-system design considerations for dynamically-scheduled processors
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Prediction caches for superscalar processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Out-of-order vector architectures
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Initial results on the performance and cost of vector microprocessors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A performance study of out-of-order vector architectures and short registers
ICS '98 Proceedings of the 12th international conference on Supercomputing
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Dynamic history-length fitting: a third level of adaptivity for branch prediction
Proceedings of the 25th annual international symposium on Computer architecture
Options for dynamic address translation in COMAs
Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the 1st international workshop on Software and performance
Simple vector microprocessors for multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical study of decentralized ILP execution models
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Fast out-of-order processor simulation using memoization
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications
IEEE Transactions on Computers - Special issue on cache memory and related problems
Tolerating late memory traps in ILP processors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Decoupling local variable accesses in a wide-issue superscalar processor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Commit-reconcile & fences (CRF): a new memory model for architects and compiler writers
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Concurrent Event Handling through Multithreading
IEEE Transactions on Computers
Access region locality for high-bandwidth processor memory system design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The use of multithreading for exception handling
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning
International Journal of Parallel Programming
Using complete system simulation to characterize SPECjvm98 benchmarks
Proceedings of the 14th international conference on Supercomputing
IEEE Transactions on Computers
Efficient performance prediction for modern microprocessors
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
A fully associative software-managed cache design
Proceedings of the 27th annual international symposium on Computer architecture
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors
IEEE Transactions on Computers
FLASH vs. (simulated) FLASH: closing the simulation loop
ACM SIGPLAN Notices
ACM Computing Surveys (CSUR)
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Register integration: a simple and efficient implementation of squash reuse
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS)
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
High Bandwidth On-Chip Cache Design
IEEE Transactions on Computers
Improving index performance through prefetching
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Improving Gang Scheduling through job performance analysis and malleability
ICS '01 Proceedings of the 15th international conference on Supercomputing
Integrating superscalar processor components to implement register caching
ICS '01 Proceedings of the 15th international conference on Supercomputing
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
L1 data cache decomposition for energy efficiency
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Energy reduction in queues and stacks by adaptive bitwidth compression
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
A High-Bandwidth Memory Pipeline for Wide Issue Processors
IEEE Transactions on Computers
IEEE Transactions on Computers
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Characterizing operating system activity in SPECjvm98 Benchmarks
Workload characterization of emerging computer applications
Bloom filtering cache misses for accurate data speculation and prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Fractal prefetching B+-Trees: optimizing both cache and disk performance
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Energy-efficient hybrid wakeup logic
Proceedings of the 2002 international symposium on Low power electronics and design
Register tiling in nonrectangular iteration spaces
ACM Transactions on Programming Languages and Systems (TOPLAS)
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
A Simulation Study of Decoupled Vector Architectures
The Journal of Supercomputing
Speculative synchronization: applying thread-level speculation to explicitly parallel applications
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Understanding and improving operating system effects in control flow prediction
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dynamic dead-instruction detection and elimination
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Computer
Optimizing Main-Memory Join on Modern Hardware
IEEE Transactions on Knowledge and Data Engineering
Access Control Mechanisms in a Distributed, Persistent Memory System
IEEE Transactions on Parallel and Distributed Systems
Improving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Selective Register Renaming: A Compiler-Driven Approach to Dynamic Register Renaming
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Run-Time Support to Register Allocation for Loop Parallelization of Image Processing Programs
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Conflict-Free Access to Multiple Single-Ported Register Files
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Hierarchical Interconnects for On-Chip Clustering
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An Adaptive Issue Queue for Reduced Power at High Performance
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Efficient Interconnects for Clustered Microarchitectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Speculative Sequential Consistency with Little Custom Storage
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Statistical-Empirical Hybrid Approach to Hierarchical Memory Analysis
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Considerations for Scalable CAE on the SGI ccNUMA Architecture
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Execution Latency Reduction via Variable Latency Pipeline and Instruction Reuse
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Three extensions to register integration
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Power-efficient issue queue design
Power aware computing
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Front-End Policies for Improved Issue Efficiency in SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Data Dependence Tracking and its Application to Branch Prediction
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Instruction-level parallel processors-dynamic and static scheduling tradeoffs
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Simultaneous Multithreading-Based Routers
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Visualizing Application Behavior on Superscalar Processors
INFOVIS '99 Proceedings of the 1999 IEEE Symposium on Information Visualization
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay
Proceedings of the 30th annual international symposium on Computer architecture
On load latency in low-power caches
Proceedings of the 2003 international symposium on Low power electronics and design
Low cost instruction cache designs for tag comparison elimination
Proceedings of the 2003 international symposium on Low power electronics and design
Checkpointing alternatives for high performance, power-aware processors
Proceedings of the 2003 international symposium on Low power electronics and design
Branch prediction on demand: an energy-efficient solution
Proceedings of the 2003 international symposium on Low power electronics and design
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Multiple-path execution for chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
Scheduling Reusable Instructions for Power Reduction
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Leakage Energy Reduction in Register Renaming
ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
Reducing register pressure through LAER algorithm
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Tolerating Late Memory Traps in Dynamically Scheduled Processors
IEEE Transactions on Computers
A general decomposition strategy for verifying register renaming
Proceedings of the 41st annual Design Automation Conference
Memory Ordering: A Value-Based Approach
Proceedings of the 31st annual international symposium on Computer architecture
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
Impact of technology scaling on energy aware execution cache-based microarchitectures
Proceedings of the 2004 international symposium on Low power electronics and design
Mixed-clock issue queue design for energy aware, high-performance cores
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Area and System Clock Effects on SMT/CMP Throughput
IEEE Transactions on Computers
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures
IEEE Transactions on Parallel and Distributed Systems
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Execution cache-based microarchitecture power-efficient superscalar processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines
Proceedings of the 32nd annual international symposium on Computer Architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization
Proceedings of the 32nd annual international symposium on Computer Architecture
Dynamic Verification of Sequential Consistency
Proceedings of the 32nd annual international symposium on Computer Architecture
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction
IEEE Transactions on Computers
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Optimistic intra-transaction parallelism on chip multiprocessors
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Fast branch misprediction recovery in out-of-order superscalar processors
Proceedings of the 19th annual international conference on Supercomputing
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Memory State Compressors for Giga-Scale Checkpoint/Restore
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor
IEEE Transactions on Computers
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors
IEEE Transactions on Parallel and Distributed Systems
Error-tolerance memory Microarchitecture via Dynamic Multithreading
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A New Pointer-based Instruction Queue Design and Its Power-Performance Evaluation
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency
IEEE Transactions on Computers
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
International Journal of Parallel Programming
International Journal of Parallel Programming
Speculative early register release
Proceedings of the 3rd conference on Computing frontiers
In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs)
IEEE Transactions on Computers
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads
Proceedings of the 33rd annual international symposium on Computer Architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Microarchitecture of the Godson-2 processor
Journal of Computer Science and Technology
SEED: scalable, efficient enforcement of dependences
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files
IEEE Transactions on Computers
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
High-level power analysis for multi-core chips
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
BranchTap: improving performance with very few checkpoints through adaptive speculation control
Proceedings of the 20th annual international conference on Supercomputing
OS-Aware Branch Prediction: Improving Microprocessor Control Flow Prediction for Operating Systems
IEEE Transactions on Computers
I-cache multi-banking and vertical interleaving
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread
Microprocessors & Microsystems
Reducing I-cache energy of multimedia applications through low cost tag comparison elimination
Journal of Embedded Computing - Cache exploitation in embedded systems
Reducing non-deterministic loads in low-power caches via early cache set resolution
Microprocessors & Microsystems
Speculative trivialization point advancing in high-performance processors
Journal of Systems Architecture: the EUROMICRO Journal
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
Performance-driven processor allocation
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
ACM Transactions on Mathematical Software (TOMS)
An enhanced DLX-based superscalar system simulator
WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
Evaluating the performance of dynamic branch prediction schemes with BPSim
WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
SATSim: a superscalar architecture trace simulator using interactive animation
WCAE '00 Proceedings of the 2000 workshop on Computer architecture education
Superscalar out-of-order demystified in four instructions
WCAE '03 Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture
WCAE '06 Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
On the latency, energy and area of checkpointed, superscalar register alias tables
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality
IEEE Transactions on Computers
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
Incrementally parallelizing database transactions with thread-level speculation
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Computers
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Hardware support for early register release
International Journal of High Performance Computing and Networking
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
Compiler and hardware support for reducing the synchronization of speculative threads
ACM Transactions on Architecture and Code Optimization (TACO)
High performance set associative translation lookaside buffers for low power microprocessors
Integration, the VLSI Journal
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
Achieving Out-of-Order Performance with Almost In-Order Complexity
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A Two-Level Load/Store Queue Based on Execution Locality
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A physical level study and optimization of CAM-based checkpointed register alias table
Proceedings of the 13th international symposium on Low power electronics and design
Power-efficient clustering via incomplete bypassing
Proceedings of the 13th international symposium on Low power electronics and design
Transparent reconfigurable acceleration for heterogeneous embedded applications
Proceedings of the conference on Design, automation and test in Europe
Improving support for locality and fine-grain sharing in chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Zero loads: canceling load requests by tracking zero values
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Improving error tolerance for multithreaded register files
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A distributed processor state management architecture for large-window processors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Dynamically Adapted Low Power ASIPs
ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
Journal of Computer Science and Technology
Reexecution and Selective Reuse in Checkpoint Processors
Transactions on High-Performance Embedded Architectures and Compilers II
A compiler optimization to reduce soft errors in register files
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Journal of Computer Science and Technology
InvisiFence: performance-transparent memory ordering in conventional multiprocessors
Proceedings of the 36th annual international symposium on Computer architecture
Checkpoint allocation and release
ACM Transactions on Architecture and Code Optimization (TACO)
An energy-efficient checkpointing mechanism for out of order commit processor
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Folding active list for high performance and low power
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Exploiting execution locality with a decoupled Kilo-instruction processor
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Decoupled state-execute architecture
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Using a way cache to improve performance of set-associative caches
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Turbo-ROB: a low cost checkpoint/restore accelerator
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A power-aware hybrid RAM-CAM renaming mechanism for fast recovery
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
An intra-chip free-space optical interconnect
Proceedings of the 37th annual international symposium on Computer architecture
On the latency and energy of checkpointed superscalar register alias tables
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A compiler-microarchitecture hybrid approach to soft error reduction for register files
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Characterization and exploitation of narrow-width loads: the narrow-width cache approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Wake-up logic optimizations through selective match and wakeup range limitation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On the exploitation of narrow-width values for improving register file reliability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Performance evaluation of superscalar processor with multi-bank register file using SPEC2000
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Towards an adaptable multiple-ISA reconfigurable processor
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)
Bridge floating-point fused multiply-add design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A framework for correction of multi-bit soft errors in L2 caches based on redundancy
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A case for an SC-preserving compiler
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
A unified approach to eliminate memory accesses early
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
CoreSymphony: an efficient reconfigurable multi-core architecture
ACM SIGARCH Computer Architecture News
Efficient sequential consistency via conflict ordering
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Achieving reliable system performance by fast recovery of branch miss prediction
Journal of Network and Computer Applications
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
A memory bandwidth effective cache store miss policy
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
An optimized front-end physical register file with banking and writeback filtering
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Complexity-Effective rename table design for rapid speculation recovery
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Static analysis and compiler design for idempotent processing
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Reducing L1 caches power by exploiting software semantics
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
iGPU: exception support and speculative execution on GPUs
Proceedings of the 39th Annual International Symposium on Computer Architecture
Accurately modeling superscalar processor performance with reduced trace
Journal of Parallel and Distributed Computing
Towards a multiple-ISA embedded system
Journal of Systems Architecture: the EUROMICRO Journal
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Exploiting replicated checkpoints for soft error detection and correction
Proceedings of the Conference on Design, Automation and Test in Europe
Software-based register file vulnerability reduction for embedded processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
Hi-index | 0.06 |
The Mips R10000 is a dynamic superscalar microprocessor that implements the 64-bit Mips-4 Instruction Set Architecture. It fetches and decodes four instructions per cycle and dynamically issues them to five fully pipelined low-latency execution units. Instructions can be fetched and executed speculatively beyond branches. Instructions graduate in order upon completion. Although instructions execute out of order, the processor still provides sequential memory consistency and precise exception handling.The R10000 is designed for high performance, even in large real-world applications which have poor memory locality. With speculative execution, it calculates memory addresses and initiates cache refills early. Its hierarchical nonblocking memory system helps hide memory latency with two levels of set-associative, write-back caches.