Truss: A Reliable, Scalable Server Architecture

Authors:
Brian T. Gold;Jangwoo Kim;Jared C. Smolens;Eric S. Chung;Vasileios Liaskovitis;Eriko Nurvitadhi;Babak Falsafi;James C. Hoe;Andreas G. Nowatzyk
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Cedars-Sinai Medical Center
Venue:
IEEE Micro
Year:
2005

Citing 13
Cited 2

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
The Future of Systems Research

Computer
IBM's S/390 G5 Microprocessor Design

IEEE Micro
Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor

Proceedings of the 30th annual international symposium on Computer architecture
Recovery Oriented Computing: A New Research Agenda for a New Century

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Fingerprinting: bounding soft-error detection latency and bandwidth

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Commercial Fault Tolerance: A Tale of Two Systems

IEEE Transactions on Dependable and Secure Computing
SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation

IEEE Micro

Configurable isolation: building high availability systems with commodity multi-core processors

Proceedings of the 34th annual international symposium on Computer architecture
Cost-effective safety and fault localization using distributed temporal redundancy

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional reliable servers require costly design changes to the processor, use custom system or application software, or cannot scale beyond a few processing elements. We present TRUSS, a family of server architectures providing reliable, scalable computation from distributed shared-memory hardware while requiring no changes to software. The TRUSS paradigm centers around a logical division of computation and memory that isolates errors in processing from memory storage and vice versa. In this paper, we present the key mechanisms that enable this separation and use full-system simulation to evaluate the impact on a range of commercial and scientific workloads.