CPR: Composable performance regression for scalable multiprocessor models

Authors:
Benjamin C. Lee;Jamison Collins;Hong Wang;David Brooks
Affiliations:
Microsoft Research, USA;Intel Corporation, USA;Intel Corporation, USA;Harvard University, USA
Venue:
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Year:
2008

Citing 8
Cited 15

Accurate and efficient regression modeling for microarchitectural performance and power prediction

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Efficiently exploring architectural design spaces via predictive modeling

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A Predictive Performance Model for Superscalar Processors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Regression Modeling Strategies

Regression Modeling Strategies
An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation

IEEE Computer Architecture Letters
Using PredictiveModeling for Cross-Program Design Space Exploration in Multicore Systems

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Illustrative Design Space Studies with Microarchitectural Regression Models

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Microarchitectural Design Space Exploration Using an Architecture-Centric Approach

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture

Applied inference: Case studies in microarchitectural design

ACM Transactions on Architecture and Code Optimization (TACO)
A statistical performance model of the opteron processor

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
GROPHECY: GPU performance projection from CPU code skeletons

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Compiler-Directed performance model construction for parallel programs

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Effective and efficient microprocessor design space exploration using unlabeled design configurations

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Achieving application-centric performance targets via consolidation on multicores: myth or reality?

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Power-aware multi-core simulation for early design stage hardware/software co-optimization

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Microarchitectural design space exploration made fast

Microprocessors & Microsystems
Inferred Models for Dynamic and Sparse Hardware-Software Spaces

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Flicker: a dynamically adaptive architecture for power limited multicore systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
DeepDive: transparently identifying and managing performance interference in virtualized environments

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Effective and efficient microprocessor design space exploration using unlabeled design configurations

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
What to expect when you are consolidating: effective prediction models of application performance on multicores

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Uniprocessor simulators track resource utilization cycle by cycle to estimate performance. Multiprocessor simulators, however, must account for synchronization events that increase the cost of every cycle simulated and shared resource contention that increases the total number of cycles simulated. These effects cause multiprocessor simulation times to scale superlinearly with the number of cores. Composable performance regression (CPR) fundamentally addresses these intractable multiprocessor simulation times, estimating multiprocessor performance with a combination of uniprocessor, contention, and penalty models. The uniprocessor model predicts baseline performance of each core while the contention models predict interfering accesses from other cores. Uniprocessor and contention model outputs are composed by a penalty model to produce the final multiprocessor performance estimate. Trained with a production quality simulator, CPR is accurate with median errors of 6.63, 4.83 percent for dual-, quad-core multiprocessors. Furthermore, composable regression is scalable, requiring 0.33脳 the simulations required by prior regression strategies.