Identifying energy-efficient concurrency levels using machine learning

Authors:
Matthew Curtis-Maury;Karan Singh;Sally A. McKee;Filip Blagojevic;Dimitrios S. Nikolopoulos;Bronis R. de Supinski;Martin Schulz
Affiliations:
Department of Computer Science, Virginia Tech, Blacksburg, 24061, USA;Computer Systems Lab, Cornell University, Ithaca, NY 14853, USA;Computer Systems Lab, Cornell University, Ithaca, NY 14853, USA;Department of Computer Science, Virginia Tech, Blacksburg, 24061, USA;Department of Computer Science, Virginia Tech, Blacksburg, 24061, USA;Lawrence Livermore National Laboratory, CA 94551, USA;Lawrence Livermore National Laboratory, CA 94551, USA
Venue:
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Year:
2007

Citing 0
Cited 7

Optimizing thread throughput for multithreaded workloads on memory constrained CMPs

Proceedings of the 5th conference on Computing frontiers
Real time power estimation and thread scheduling via performance counters

ACM SIGARCH Computer Architecture News
Performance balancing: software-based on-chip memory management for effective CMP executions

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
An approach to resource-aware co-scheduling for CMPs

Proceedings of the 24th ACM International Conference on Supercomputing
Comparing scalability prediction strategies on an SMP of CMPs

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
A Green Computing Based Architecture Comparison and Analysis

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore microprocessors have been largely motivated by the diminishing returns in performance and the increased power consumption of single-threaded ILP microprocessors. With the industry already shifting from multicore to many-core microprocessors, software developers must extract more thread-level parallelism from applications. Unfortunately, low power-efficiency and diminishing returns in performance remain major obstacles with many cores. Poor interaction between software and hardware, and bottlenecks in shared hardware structures often prevent scaling to many cores, even in applications where a high degree of parallelism is potentially available. In some cases, throwing additional cores at a problem may actually harm performance and increase power consumption. Better use of otherwise limitedly beneficial cores by software components such as hypervisors and operating systems can improve system-wide performance and reliability, even in cases where power consumption is not a main concern. In response to these observations, we evaluate an approach to throttle concurrency in parallel programs dynamically. We throttle concurrency to levels with higher predicted efficiency from both performance and energy standpoints, and we do so via machine learning, specifically artificial neural networks (ANNs). One advantage of using ANNs over similar techniques previously explored is that the training phase is greatly simplified, thereby reducing the burden on the end user. Using machine learning in the context of concurrency throttling is novel. We show that ANNs are effective for identifying energy-efficient concurrency levels in multithreaded scientific applications, and we do so using physical experimentation on a state-of-the-art quad-core Xeon platform.