TMA: a trap-based memory architecture

Authors:
Håkan Zeffer;Zoran Radović;Martin Karlsson;Erik Hagersten
Affiliations:
Uppsala University, Uppsala, SWEDEN;Uppsala University, Uppsala, SWEDEN;Uppsala University, Uppsala, SWEDEN;Uppsala University, Uppsala, SWEDEN
Venue:
Proceedings of the 20th annual international conference on Supercomputing
Year:
2006

Citing 28
Cited 2

Memory coherence in shared virtual memory systems

PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
IBM RISC System/6000 processor architecture

IBM Journal of Research and Development
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
Decoupled hardware support for distributed shared memory

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Cashmere-2L: software coherent shared memory on a clustered remote-write network

Proceedings of the sixteenth ACM symposium on Operating systems principles
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The use of multithreading for exception handling

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Removing the overhead from software-based shared memory

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Simics: A Full System Simulation Platform

Computer
The MAJC Architecture: A Synthesis of Parallelism and Scalability

IEEE Micro
START-NG: Delivering Seamless Parallel Computing

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Ultra-high performance communication with MPI and the Sun fire™ link interconnect

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Sirocco: Cost-Effective Fine-Grain Distributed Shared Memory

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
In-Line Interrupt Handling for Software-Managed TLBs

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
SMTp: An Architecture for Next-generation Scalable Multi-threading

Proceedings of the 31st annual international symposium on Computer architecture
Checkpointed Early Load Retirement

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture

A case for low-complexity MP architectures

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Proceedings of the 38th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The advances in semiconductor technology have set the shared-memory server trend towards processors with multiple cores per die and multiple threads per core. We believe that this technology shift forces a reevaluation of how to interconnect multiple such chips to form larger systems.This paper argues that by adding support for coherence traps in future chip multiprocessors, large-scale server systems can be formed at a much lower cost. This is due to shorter design time, verification and time to market when compared to its traditional all-hardware counter part. In the proposed trap-based memory architecture (TMA), software trap handlers are responsible for obtaining read/write permission, whereas the coherence trap hardware is responsible for the actual permission check.In this paper we evaluate a TMA implementation (called TMA Lite) with a minimal amount of hardware extensions, all contained within the processor. The proposed mechanisms for coherence trap processing should not affect the critical path and have a negligible cost in terms of area and power for most processor designs.Our evaluation is based on detailed full system simulation using out-of-order processors with one or two dual-threaded cores per die as processing nodes. The results show that a TMA based distributed shared memory system can perform on par with a highly optimized hardware based design.