ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

Authors:
Manoj Franklin;Gurindar S. Sohi
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1996

Citing 15
Cited 73

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Critical issues regarding HPS, a high performance microarchitecture

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Dynamic Instruction Scheduling and the Astronautics ZS-1

Computer
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Architectural support for register allocation in the presence of aliasing

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An architectural framework for migration from CISC to higher performance platforms

ICS '92 Proceedings of the 6th international conference on Supercomputing
The multiscalar architecture

The multiscalar architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Control flow prediction for dynamic ILP processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies

IEEE Transactions on Computers
Hpsm: exploiting concurrency to achieve high performance in a single-chip microarchitecture

Hpsm: exploiting concurrency to achieve high performance in a single-chip microarchitecture

Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
The design and performance of a conflict-avoiding cache

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
Multiscalar processors

25 years of the international symposia on Computer architecture (selected papers)
Removal of memory access bottlenecks for scheduling control-flow intensive behavioral descriptions

Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Randomized Cache Placement for Eliminating Conflicts

IEEE Transactions on Computers - Special issue on cache memory and related problems
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving the performance of speculatively parallel applications on the Hydra CMP

ICS '99 Proceedings of the 13th international conference on Supercomputing
Classifying load and store instructions for memory renaming

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Control independence in trace processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Trace preconstruction

Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
A time-stamping algorithm for efficient performance estimation of superscalar processors

Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Speculative Versioning Cache

IEEE Transactions on Parallel and Distributed Systems
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A general compiler framework for speculative multithreading

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Handling of packet dependencies: a critical issue for highly parallel network processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

International Journal of Parallel Programming
Trace Processors: Moving to Fourth-Generation Microarchitectures

Computer
Speculative Multithreaded Processors

Computer
Letters

Computer
Amir Roth: Speculative Multithreaded Processors

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Multiscalar Execution along a Single Flow of Control

ICPP '97 Proceedings of the international Conference on Parallel Processing
A Feasibility Study of Hierarchical Multithreading

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Limits of Task-Based Parallelism in Irregular Applications

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Implicitly-multithreaded processors

Proceedings of the 30th annual international symposium on Computer architecture
A Clustered Approach to Multithreaded Processors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Multiple-path execution for chip multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scalable Hardware Memory Disambiguation for High ILP Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Min-cut program decomposition for thread-level speculation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Scalable Hardware Memory Disambiguation for High-ILP Processors

IEEE Micro
Scalable Load and Store Processing in Latency Tolerant Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Speculative execution in a distributed file system

Proceedings of the twentieth ACM symposium on Operating systems principles
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm

IEEE Transactions on Parallel and Distributed Systems
Scalable Store-Load Forwarding via Store Queue Index Prediction

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Hybrid transactional memory

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Speculative execution in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Speculative thread decomposition through empirical optimization

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Late-binding: enabling unordered load-store queues

Proceedings of the 34th annual international symposium on Computer architecture
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
A Two-Level Load/Store Queue Based on Execution Locality

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A practical approach of memory access parallelization to exploit multiple off-chip DDR memories

Proceedings of the 45th annual Design Automation Conference
Tolerating latency in replicated state machines through client speculation

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Supporting speculative parallelization in the presence of dynamic data structures

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speculative parallelization using state separation and multiple value prediction

Proceedings of the 2010 international symposium on Memory management
SAMIE-LSQ: set-associative multiple-instruction entry load/store queue

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Enhanced speculative parallelization via incremental recovery

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Supporting speculative multithreading on simultaneous multithreaded processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Low-overhead core swapping for thermal management

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
SCIN-cache: Fast speculative versioning in multithreaded cores

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Efficient execution of speculative threads and transactions with hardware transactional memory

Future Generation Computer Systems

Quantified Score

Hi-index	14.99

Visualization

Abstract

To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references(especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such situations, memory reference addresses have to be disambiguated. This paper presents a novel hardware mechanism, called an Address Resolution Buffer (ARB), for performing dynamic reordering of memory references. The ARB supports the following features: 1) dynamic memory disambiguation in a decentralized manner, 2) multiple memory references per cycle, 3) out-of-order execution of memory references, 4) unresolved loads and stores, 5) speculative loads and stores, and 6) memory renaming. The paper presents the results of a simulation study that we conducted to verify the efficacy of the ARB for a superscalar processor. The paper also shows the ARB's application in a multiscalar processor.