Machine Learning
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors
Proceedings of the conference on Design, automation and test in Europe - Volume 2
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
Accurate and efficient regression modeling for microarchitectural performance and power prediction
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Methods of inference and learning for performance modeling of parallel applications
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
An intra-task dvfs technique based on statistical analysis of hardware events
Proceedings of the 4th international conference on Computing frontiers
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
Identifying energy-efficient concurrency levels using machine learning
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Leakage-aware multiprocessor scheduling for low power
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Cache Conscious Task Regrouping on Multicore Processors
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Critical path-based thread placement for NUMA systems
ACM SIGMETRICS Performance Evaluation Review
Hi-index | 0.00 |
Diminishing performance returns and increasing power consumption of single-threaded processors have made chip multiprocessors (CMPs) an industry imperative. Unfortunately, poor software/hardware interaction and bottlenecks in shared hardware structures can prevent scaling to many cores. In fact, adding a core may harm performance and increase power consumption. Given these observations, we compare two approaches to predicting parallel application scalability: multiple linear regression and artificial neural networks (ANNs). We throttle concurrency to levels with higher predicted power/performance efficiency. We perform experiments on a state-of-the-art, dual-processor, quad-core platform, showing that both methodologies achieve high accuracy and identify energy-efficient concurrency levels in multithreaded scientific applications. The ANN approach has advantages, but the simpler regression-based model achieves slightly higher accuracy and performance. The approaches exhibit median error of 7.5% and 5.6%, and improve performance by an average of 7.4% and 9.5%, respectively.