High-bandwidth data memory systems for superscalar processors

Authors:
Gurindar S. Sohi;Manoj Franklin
Affiliations:
-;-
Venue:
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Year:
1991

Citing 19
Cited 85

Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Hardware support for large atomic units in dynamically scheduled machines

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
Machine organization of the IBM RISC System/6000 processor

IBM Journal of Research and Development
Micro 2000

IEEE Spectrum
Cache performance of the integer SPEC benchmarks on a RISC

ACM SIGARCH Computer Architecture News
Trace-driven simulations for a two-level cache design in open bus systems

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Bibliography and reading on CPU cache memories and related topics

ACM SIGARCH Computer Architecture News
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

The effect of real data cache behavior on the performance of a microarchitecture that supports dynamic scheduling

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Computer Technology and Architecture: An Evolving Interaction

Computer
Delayed consistency and its effects on the miss rate of parallel programs

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Tradeoffs in processor/memory interfaces for superscalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Pseudo vector processor based on register-windowed superscalar pipeline

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Balanced scheduling: instruction scheduling when memory latency is uncertain

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Cache write policies and performance

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Limitations of cache prefetching on a bus-based multiprocessor

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A scalar architecture for pseudo vector processing based on slide-windowed registers

ICS '93 Proceedings of the 7th international conference on Supercomputing
Tradeoffs in two-level on-chip caching

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Exploring the design space for a shared-cache multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Complexity/performance tradeoffs with non-blocking loads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Effective cache prefetching on bus-based multiprocessors

ACM Transactions on Computer Systems (TOCS)
The benefits of clustering in shared address space multiprocessors: an applications-driven investigation

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An analytical model of high performance superscalar-based multiprocessors

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Increasing cache bandwidth using multi-port caches for exploiting ILP in non-numerical code

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs

Proceedings of the 28th annual international symposium on Microarchitecture
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Increasing cache port efficiency for dynamic superscalar microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
High-bandwidth address translation for multiple-issue processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving single-process performance with multithreaded processors

ICS '96 Proceedings of the 10th international conference on Supercomputing
Tango: a hardware-based data prefetching technique for superscalar processors

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Constructing instruction traces from cache-filtered address traces (CITCAT)

ACM SIGARCH Computer Architecture News
Two-ported cache alternatives for superscalar processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A study on the number of memory ports in multiple instruction issue machines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs

ICS '97 Proceedings of the 11th international conference on Supercomputing
Data caches for superscalar processors

ICS '97 Proceedings of the 11th international conference on Supercomputing
The design and analysis of a cache architecture for texture mapping

Proceedings of the 24th annual international symposium on Computer architecture
Designing high bandwidth on-chip caches

Proceedings of the 24th annual international symposium on Computer architecture
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
On high-bandwidth data cache design for multi-issue processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Resource widening versus replication: limits and performance-cost trade-off

ICS '98 Proceedings of the 12th international conference on Supercomputing
Utilizing reuse information in data cache management

ICS '98 Proceedings of the 12th international conference on Supercomputing
Retrospective: instruction issue logic for high-performance, interruptable pipelined processors

25 years of the international symposia on Computer architecture (selected papers)
Simultaneous multithreading: maximizing on-chip parallelism

25 years of the international symposia on Computer architecture (selected papers)
Effects of Multithreading on Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
A dynamic scheduling logic for exploiting multiple functional units in single chip multithreaded architectures

Proceedings of the 1999 ACM symposium on Applied computing
Decoupling local variable accesses in a wide-issue superscalar processor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Access region locality for high-bandwidth processor memory system design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Active Management of Data Caches by Exploiting Reuse Information

IEEE Transactions on Computers
Run-Time Cache Bypassing

IEEE Transactions on Computers
Hardware spatial forwarding for widely shared data

Proceedings of the 14th international conference on Supercomputing
High Bandwidth On-Chip Cache Design

IEEE Transactions on Computers
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A High-Bandwidth Memory Pipeline for Wide Issue Processors

IEEE Transactions on Computers
Improving Latency Tolerance of Multithreading through Decoupling

IEEE Transactions on Computers
Facilitating level three cache studies using set sampling

Proceedings of the 32nd conference on Winter simulation
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Optimizing a Superscalar Machine to Run Vector Code

IEEE Parallel & Distributed Technology: Systems & Technology
A Memory Controller for Improved Performance of Streamed Computations on Symmetric Multiprocessors

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Microprocessors - 10 Years Back, 10 Years Ahead

Informatics - 10 Years Back. 10 Years Ahead.
Access ordering and memory-conscious cache utilization

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
How Useful Are Non-Blocking Loads, Stream Buffers and Speculative Execution in Multiple Issue Processors?

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Program balance and its impact on high performance RISC architectures

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Just Say No: Benefits of Early Cache Miss Determination

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Inaccuracy of Trace-Driven Simulation Using Incomplete Multiprogramming Trace Data

MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Balanced scheduling: instruction scheduling when memory latency is uncertain

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Wire Delay is Not a Problem for SMT (In the Near Future)

Proceedings of the 31st annual international symposium on Computer architecture
Store Buffer Design in First-Level Multibanked Data Caches

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Exploiting the replication cache to improve performance for multiple-issue microprocessors

ACM SIGARCH Computer Architecture News - Special issue: MEDEA 2004 workshop
Exploiting the replication cache to improve cache read bandwidth cost effectively

MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Scalable Cache Miss Handling for High Memory-Level Parallelism

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
I-cache multi-banking and vertical interleaving

Proceedings of the 17th ACM Great Lakes symposium on VLSI
Module Partitioning and Interlaced Data Placement Schemes to Reduce Conflicts in Interleaved Memories

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Parallel Memory Architecture for Application-Specific Instruction-Set Processors

Journal of Signal Processing Systems
Access region cache with register guided memory reference partitioning

Journal of Systems Architecture: the EUROMICRO Journal
A load-instruction unit for pipelined processors

IBM Journal of Research and Development
Parallel memory architecture for TTA processor

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Characterization of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Dynamic partition of memory reference instructions – a register guided approach

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
A high performance adaptive miss handling architecture for chip multiprocessors

Transactions on High-Performance Embedded Architectures and Compilers IV
FPGA based efficient on-chip memory for image processing algorithms

Microelectronics Journal
Virtually split cache: An efficient mechanism to distribute instructions and data

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.02

High-bandwidth data memory systems for superscalar processors

Quantified Score

Visualization

Abstract