Ten ways to waste a parallel computer

Authors:
Katherine Yelick
Affiliations:
University of California at Berkeley, Berkeley, CA, USA
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 0
Cited 2

DRAM errors in the wild: technical perspective

Communications of the ACM
Low-Latency Mechanisms for Near-Threshold Operation of Private Caches in Shared Memory Multicores

MICROW '12 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops

Quantified Score

Hi-index	0.02

Visualization

Abstract

As clock speed increases taper off and hardware designers struggle to scale parallelism within a chip, software developers and researchers must face the challenge of writing portable software with no clear architectural target. On the hardware side, energy considerations will dominate many of the design decisions, and will ultimately limit what systems and applications can be built. This is especially true at the high end, where the next major milestone of exascale computing will be unattainable without major improvements in efficiency. Although hardware designers have long worried about the efficiency of their designs, especially for battery-operated devices, software developers in general have not. To illustrate this point, I will describe some of the top ways to waste time and therefore energy waiting for communication, synchronization, or interactions with users or other systems. Data movement, rather than computation, is the big consumer of energy, yet software often moves data up and down the memory hierarchy or across a network multiple times. At the same time, hardware designers need to take into account the constraints of the computational problems that will run on their systems, as a design that is poorly matched to the computational requirements will end up being inefficient. Drawing on my own experience in scientific computing, I will give examples of how to make the combination of hardware, algorithms and software more efficient, but also describe some of the challenges that are inherent in the application problems we want to solve. The community needs to take an integrated approach to the problem, and consider how much business or science can be done per Joule, rather than optimizing a particular component of the system in isolation. This will require rethinking the algorithms, programming models, and hardware in concert, and therefore an unprecedented level of collaboration and cooperation between hardware and software designers.