The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
The Future of Systems Research
Computer
IBM's S/390 G5 Microprocessor Design
IEEE Micro
Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor
Proceedings of the 30th annual international symposium on Computer architecture
Recovery Oriented Computing: A New Research Agenda for a New Century
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Fingerprinting: bounding soft-error detection latency and bandwidth
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Commercial Fault Tolerance: A Tale of Two Systems
IEEE Transactions on Dependable and Secure Computing
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Cost-effective safety and fault localization using distributed temporal redundancy
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Hi-index | 0.00 |
Traditional reliable servers require costly design changes to the processor, use custom system or application software, or cannot scale beyond a few processing elements. We present TRUSS, a family of server architectures providing reliable, scalable computation from distributed shared-memory hardware while requiring no changes to software. The TRUSS paradigm centers around a logical division of computation and memory that isolates errors in processing from memory storage and vice versa. In this paper, we present the key mechanisms that enable this separation and use full-system simulation to evaluate the impact on a range of commercial and scientific workloads.