Warped-DMR: Light-weight Error Detection for GPGPU
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
High performance microprocessors are protected against transient and early end of life failures using a variety of error detection and fault isolation technologies. Execution units can be protected with duplication, parity prediction, or residue checking. Residue checking has an advantage due to its small size. A modulus is selected based on the radix of the numbers being checked. In a decimal floating-point unit there are two types of numbers in different bases. There are base 10 decimal numbers and base 2 integers being used. A residue checking system that makes it easy to check both base 2 and 10 numbers is discussed. Current state of the art designs that are currently in use are described as well as a novel hybrid moduli 9 and 3 residue system. The checking systems for the decimal and binary floating-point units of some recent IBM microprocessors including the Power6, Power7, z10, and z196 microprocessors are detailed.