Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Critical issues regarding HPS, a high performance microarchitecture
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Implementing Precise Interrupts in Pipelined Processors
IEEE Transactions on Computers
IEEE Transactions on Computers
Architectural support for register allocation in the presence of aliasing
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An architectural framework for migration from CISC to higher performance platforms
ICS '92 Proceedings of the 6th international conference on Supercomputing
The multiscalar architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Control flow prediction for dynamic ILP processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies
IEEE Transactions on Computers
Hpsm: exploiting concurrency to achieve high performance in a single-chip microarchitecture
Hpsm: exploiting concurrency to achieve high performance in a single-chip microarchitecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Speculative execution via address prediction and data prefetching
ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
The design and performance of a conflict-avoiding cache
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
ICS '98 Proceedings of the 12th international conference on Supercomputing
25 years of the international symposia on Computer architecture (selected papers)
Removal of memory access bottlenecks for scheduling control-flow intensive behavioral descriptions
Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Randomized Cache Placement for Eliminating Conflicts
IEEE Transactions on Computers - Special issue on cache memory and related problems
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving the performance of speculatively parallel applications on the Hydra CMP
ICS '99 Proceedings of the 13th international conference on Supercomputing
Classifying load and store instructions for memory renaming
ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
A time-stamping algorithm for efficient performance estimation of superscalar processors
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
IEEE Transactions on Parallel and Distributed Systems
Speculative lock elision: enabling highly concurrent multithreaded execution
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A general compiler framework for speculative multithreading
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Handling of packet dependencies: a critical issue for highly parallel network processors
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors
International Journal of Parallel Programming
Speculative Multithreaded Processors
Computer
Computer
Amir Roth: Speculative Multithreaded Processors
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Multiscalar Execution along a Single Flow of Control
ICPP '97 Proceedings of the international Conference on Parallel Processing
A Feasibility Study of Hierarchical Multithreading
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Limits of Task-Based Parallelism in Irregular Applications
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Implicitly-multithreaded processors
Proceedings of the 30th annual international symposium on Computer architecture
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Multiple-path execution for chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scalable Hardware Memory Disambiguation for High ILP Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Min-cut program decomposition for thread-level speculation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Scalable Load and Store Processing in Latency Tolerant Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Speculative execution in a distributed file system
Proceedings of the twentieth ACM symposium on Operating systems principles
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm
IEEE Transactions on Parallel and Distributed Systems
Scalable Store-Load Forwarding via Store Queue Index Prediction
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Bulk Disambiguation of Speculative Threads in Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Speculative execution in a distributed file system
ACM Transactions on Computer Systems (TOCS)
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Speculative thread decomposition through empirical optimization
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Late-binding: enabling unordered load-store queues
Proceedings of the 34th annual international symposium on Computer architecture
Incrementally parallelizing database transactions with thread-level speculation
ACM Transactions on Computer Systems (TOCS)
A Two-Level Load/Store Queue Based on Execution Locality
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A practical approach of memory access parallelization to exploit multiple off-chip DDR memories
Proceedings of the 45th annual Design Automation Conference
Tolerating latency in replicated state machines through client speculation
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speculative parallelization using state separation and multiple value prediction
Proceedings of the 2010 international symposium on Memory management
SAMIE-LSQ: set-associative multiple-instruction entry load/store queue
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Supporting speculative multithreading on simultaneous multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Low-overhead core swapping for thermal management
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
SCIN-cache: Fast speculative versioning in multithreaded cores
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Efficient execution of speculative threads and transactions with hardware transactional memory
Future Generation Computer Systems
Hi-index | 14.99 |
To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references(especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such situations, memory reference addresses have to be disambiguated. This paper presents a novel hardware mechanism, called an Address Resolution Buffer (ARB), for performing dynamic reordering of memory references. The ARB supports the following features: 1) dynamic memory disambiguation in a decentralized manner, 2) multiple memory references per cycle, 3) out-of-order execution of memory references, 4) unresolved loads and stores, 5) speculative loads and stores, and 6) memory renaming. The paper presents the results of a simulation study that we conducted to verify the efficacy of the ARB for a superscalar processor. The paper also shows the ARB's application in a multiscalar processor.