Highly fault-tolerant parallel computation

Authors:
D. A. Spielman
Affiliations:
-
Venue:
FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Year:
1996

Citing 0
Cited 13

Computation in noisy radio networks

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Molecular electronics: devices, systems and tools for gigagate, gigabit chips

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Defect tolerant probabilistic design paradigm for nanotechnologies

Proceedings of the 41st annual Design Automation Conference
Rounds vs queries trade-off in noisy computation

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Lower Bounds for the Noisy Broadcast Problem

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Defect tolerance at the end of the roadmap

Nano, quantum and molecular computing
'Cheap grid': Leveraging system failure using stochastic computation

Future Generation Computer Systems
A defect/error-tolerant nanosystem architecture for DSP

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Low power logic for statistical inference

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Reliable computations based on locally decodable codes

STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
A turing machine resisting isolated bursts of faults

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Making polynomials robust to noise

STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Preserving Hamming Distance in Arithmetic and Logical Operations

Journal of Electronic Testing: Theory and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We re-introduce the coded model of fault-tolerant computation in which the input and output of a computational device are treated as words in an error-correcting code. A computational device correctly computes a function in the coded model if its input and output, once decoded, are a valid input and output of the function. In the coded model, it is reasonable to hope to simulate all computational devices by devices whose size is greater by a constant factor but which are exponentially reliable even if each of their components can fail with some constant probability. We consider fine-grained parallel computations in which each processor has a constant probability of producing the wrong output at each time step. We show that any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model using wlog/sup 0(1/)w processors and time tlog/sup 0(1)/w. The failure probability of the computation will be at most t/spl middot/exp(-w/sup 1/4 /). The codes used to communicate with our fault-tolerant machines are generalized Reed-Solomon codes and can thus be encoded and decoded in O(nlog/sup 0(1)/n) sequential time and are independent of the machine they are used to communicate with. We also show how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead.