A memory system design framework: creating smart memories

Authors:
Amin Firoozshahian;Alex Solomatnikov;Ofer Shacham;Zain Asgar;Stephen Richardson;Christos Kozyrakis;Mark Horowitz
Affiliations:
Hicamp Systems Inc., Menlo Park, CA, USA;Hicamp Systems Inc., Menlo Park, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 21
Cited 8

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
Design Verification of the S3.mp Cache-Coherent Shared-Memory System

IEEE Transactions on Computers
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Xtensa: A Configurable and Extensible Processor

IEEE Micro
The Stanford Hydra CMP

IEEE Micro
Imagine: Media Processing with Streams

IEEE Micro
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Design Trade-Offs in High-Throughput Coherence Controllers

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Transactional Memory (Synthesis Lectures on Computer Architecture)

Transactional Memory (Synthesis Lectures on Computer Architecture)
The AMD Opteron Northbridge Architecture

IEEE Micro
Using Field-Repairable Control Logic to Correct Design Errors in Microprocessors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Using a configurable processor generator for computer architecture prototyping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
On-chip communication and synchronization mechanisms with cache-integrated network interfaces

Proceedings of the 7th ACM international conference on Computing frontiers
Understanding sources of inefficiency in general-purpose chips

Proceedings of the 37th annual international symposium on Computer architecture
Removing overhead from high-level interfaces

Proceedings of the 49th Annual Design Automation Conference
PARDIS: a programmable memory controller for the DDRx interfacing standards

Proceedings of the 39th Annual International Symposium on Computer Architecture
Extensible sparse functional arrays with circuit parallelism

Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming
A programmable memory controller for the DDRx interfacing standards

ACM Transactions on Computer Systems (TOCS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines' request-tracking, state-manipulation, and data movement' which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system. To validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models' cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.