SIGGRAPH '86 Proceedings of the 13th annual conference on Computer graphics and interactive techniques
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Journal of Parallel and Distributed Computing
The design, implementation and evaluation of Jade: a portable, implicitly parallel programming language
Software Fault Tolerance
SAS System for Regression,Third Edition
SAS System for Regression,Third Edition
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs
IEEE Transactions on Parallel and Distributed Systems
Acceptability-oriented computing
OOPSLA '03 Companion of the 18th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Data structure repair using goal-directed reasoning
Proceedings of the 27th international conference on Software engineering
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Enhancing server availability and security through failure-oblivious computing
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Goal-Directed Reasoning for Specification-Based Data Structure Repair
IEEE Transactions on Software Engineering
Using early phase termination to eliminate load imbalances at barrier synchronization points
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Bristlecone: A Language for Robust Software Systems
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Green: a framework for supporting energy-conscious programming using controlled approximation
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Automatically identifying critical input regions and code in applications
Proceedings of the 19th international symposium on Software testing and analysis
Patterns and statistical analysis for understanding reduced resource computing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Proceedings of the 20th ACM SIGPLAN workshop on Partial evaluation and program manipulation
Dynamic knobs for responsive power-aware computing
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Managing performance vs. accuracy trade-offs with loop perforation
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Probabilistically accurate program transformations
SAS'11 Proceedings of the 18th international conference on Static analysis
Efficiently speeding up sequential computation through the n-way programming model
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Randomized accuracy-aware program transformations for efficient approximate computations
POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
What to do when things go wrong: recovery in complex (computer) systems
Proceedings of the 11th annual international conference on Aspect-oriented Software Development Companion
Language and compiler support for auto-tuning variable-accuracy algorithms
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Obtaining and reasoning about good enough software
Proceedings of the 49th Annual Design Automation Conference
Proving acceptability properties of relaxed nondeterministic approximate programs
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Bolt: on-demand infinite loop escape in unmodified binaries
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Verified integrity properties for safe approximate program transformations
PEPM '13 Proceedings of the ACM SIGPLAN 2013 workshop on Partial evaluation and program manipulation
Parallelizing Sequential Programs with Statistical Accuracy Tests
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Verifying quantitative reliability for programs that execute on unreliable hardware
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
SAGE: self-tuning approximation for graphics engines
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Paraprox: pattern-based approximation for data parallel applications
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
We present a new technique for enabling computations to survive errors and faults while providing a bound on any resulting output distortion. A developer using the technique first partitions the computation into tasks. The execution platform then simply discards any task that encounters an error or a fault and completes the computation by executing any remaining tasks. This technique can substantially improve the robustness of the computation in the face of errors and faults. A potential concern is that discarding tasks may change the result that the computation produces.Our technique randomly samples executions of the program at varying task failure rates to obtain a quantitative, probabilistic model that characterizes the distortion of the output as a function of the task failure rates. By providing probabilistic bounds on the distortion, the model allows users to confidently accept results produced by executions with failures as long as the distortion falls within acceptable bounds. This approach may prove to be especially useful for enabling computations to successfully survive hardware failures in distributed computing environments.Our technique also produces a timing model that characterizes the execution time as a function of the task failure rates. The combination of the distortion and timing models quantifies an accuracy/execution time tradeoff. It therefore enables the development of techniques that purposefully fail tasks to reduce the execution time while keeping the distortion within acceptable bounds.