Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
WildFire: A Scalable Path for SMPs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Design space exploration for 3D architectures
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Microprocessors in the era of terascale integration
Proceedings of the conference on Design, automation and test in Europe
A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Argus: Low-Cost, Comprehensive Error Detection in Simple Cores
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
PowerNap: eliminating server idle power
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
A durable and energy efficient main memory using phase change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Architecture Design for Soft Errors
Architecture Design for Soft Errors
Better I/O through byte-addressable, persistent memory
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Characterizing and mitigating the impact of process variations on phase change based memory systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Improving cache lifetime reliability at ultra-low voltages
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Energy reduction for STT-RAM using early write termination
Proceedings of the 2009 International Conference on Computer-Aided Design
Dynamically replicated memory: building reliable systems from nanoscale resistive memories
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Use ECP, not ECC, for hard failures in resistive memories
Proceedings of the 37th annual international symposium on Computer architecture
Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing
Proceedings of the 37th annual international symposium on Computer architecture
Soft errors issues in low-power caches
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Rebound: scalable checkpointing for coherent shared memory
Proceedings of the 38th annual international symposium on Computer architecture
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
System-level impacts of persistent main memory using a search engine
Microelectronics Journal
Hi-index | 0.00 |
Continued technology scaling presents new challenges for system-level fault tolerance and power management. Decreasing device sizes increases the likelihood of both transient and permanent faults. Increasing device count, together with the end of Dennard scaling, makes power a critical design constraint. Techniques that seek to improve system reliability frequently use more power. Similarly, many techniques that reduce power hurt system reliability. Ideally system designers should seek out techniques that mutually benefit both fault tolerance and power management. In this paper, we develop a unified technique, called UniFI, for fault tolerance and idle power management in shared memory multi-core systems. UniFI leverages emerging non-volatile memory technologies to provide an energy-efficient lightweight checkpointing technique. In addition to tolerating a large class of faults, UniFI's frequent checkpoints permit near-instant transition to a deep sleep mode to reduce idle power. UniFI incurs very low performance and energy overheads during fault-free execution--less than 2%--while taking checkpoints every 0.1ms. For typical server workloads (such as DNS), UniFI reduces average power by 82% by shutting off during idle periods.