Best-effort computing: re-thinking parallel software and hardware

  • Authors:
  • Srimat T. Chakradhar;Anand Raghunathan

  • Affiliations:
  • NEC Laboratories America;Purdue University

  • Venue:
  • Proceedings of the 47th Design Automation Conference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the advent of mainstream parallel computing, applications can obtain better performance only by scaling to platforms with larger numbers of cores. This is widely considered to be a very challenging problem due to the difficulty of parallel programming and the bottlenecks to efficient parallel execution. Inspired by how networking and storage systems have scaled to handle very large volumes of packet traffic and persistent data, we propose a new approach to the design of scalable, parallel computing platforms. For decades, computing platforms have gone to great lengths to ensure that every computation specified by applications is faithfully executed. While this design philosophy has remained largely unchanged, applications and the basic characteristics of their workloads have changed considerably. A wide range of existing and emerging computing workloads have an inherent forgiving nature. We therefore argue that adopting a best-effort service model for various software and hardware components of the computing platform stack can lead to drastic improvements in scalability. Applications are cognizant of the best-effort model, and separate their computations into those that may be executed on a best-effort basis and those that require the traditional execution guarantees. Best-effort computations may be exploited to simply reduce the computing workload, shape it to be more suitable for parallel execution, or execute it on unreliable hardware components. Guaranteed computations are realized either through an overlay software layer on top of the best-effort substrate, or through the use of application-specific strategies. We describe a system architecture for a best-effort computing platform, provide examples of parallel software and hardware that embody the best-effort model, and show that large improvements in performance and energy efficiency are possible through the adoption of this approach.