Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Authors:
Philip Machanick;Pierre Salverda;Lance Pompe
Affiliations:
Department of Computer Science, University of the Witwatersrand, Private Bag 3, 2050 Wits, South Africa;Department of Computer Science, University of the Witwatersrand, Private Bag 3, 2050 Wits, South Africa;Department of Computer Science, University of the Witwatersrand, Private Bag 3, 2050 Wits, South Africa
Venue:
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Year:
1998

Citing 34
Cited 6

Software-controlled caches in the VMP multiprocessor

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The VMP multiprocessor: initial experience, refinements, and performance evaluation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MIPS RISC architectures

MIPS RISC architectures
A performance comparison of four supercomputers

Communications of the ACM
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Software support for speculative loads

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Consistency management for virtually indexed caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The cache memory book

The cache memory book
Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: the value of distributed synchronization

PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Design tradeoffs for software-managed TLBs

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Architectural support for translation table management in large address space machines

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Register relocation: flexible contexts for multithreading

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Multiple threads in cyclic register windows

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
An effective programmable prefetch engine for on-chip caches

Proceedings of the 28th annual international symposium on Microarchitecture
A discussion on non-blocking/lockup-free caches

ACM SIGARCH Computer Architecture News
The case for SRAM main memory

ACM SIGARCH Computer Architecture News
Improving the Accuracy of History-Based Branch Prediction

IEEE Transactions on Computers
Vector performance analysis of the NEC SX-2

ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Adaptive data prefetching using cache information

ICS '97 Proceedings of the 11th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Adaptive software cache management for distributed shared memory architectures

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Operating Systems: A Design-Oriented Approach

Operating Systems: A Design-Oriented Approach
Complete Computer System Simulation: The SimOS Approach

IEEE Parallel & Distributed Technology: Systems & Technology
The IA-64 Architecture at Work

Computer
Predicting and Precluding Problems with Memory Latency

IEEE Micro
Direct Rambus Technology: The New Main Memory Standard

IEEE Micro
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Software-Managed Address Translation

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture

A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Software-based instruction caching for embedded processors

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SMP-SoC is the answer if you ask the right questions

SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Dynamic scratchpad memory management for code in portable systems with an MMU

ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory management in a multitasking environment

EMSOFT '08 Proceedings of the 8th ACM international conference on Embedded software

Quantified Score

Hi-index	0.00

Visualization

Abstract

The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of software complexity, with a view to allowing more flexible memory system design. This paper investigates some issues in choosing between RAMpage and a conventionalcache architecture, with a view to illustrating trade-offs which can be made in choosing whether to place complexity in the memory system in hardware or in software. Performance results in this paper are based on a simple Rambus implementation of DRAM, with performance characteristics of Direct Rambus, which should be available in 1999. This paper explores the conditions under which it becomes feasible to perform a context switch on a miss in the RAMpage model, and the conditions under which RAMpage is a win over a conventional cache architecture: as the CPU-DRAM speed gap grows, RAMpage becomes more viable.