Efficiently exploring architectural design spaces via predictive modeling

Authors:
Engin Ïpek;Sally A. McKee;Rich Caruana;Bronis R. de Supinski;Martin Schulz
Affiliations:
Cornell University;Cornell University;Cornell University;Lawrence Livermore National Laboratory;Lawrence Livermore National Laboratory
Venue:
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Year:
2006

Citing 29
Cited 55

Theoretical modeling of superscalar processor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Temporal difference learning and TD-Gammon

Communications of the ACM
HLS: combining statistical and symbolic simulation to guide microprocessor designs

Proceedings of the 27th annual international symposium on Computer architecture
Machine Learning

Machine Learning
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Reducing State Loss For Effective Trace Sampling of Superscalar Processors

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Representative Traces for Processor Models with Infinite Cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
A Case for Studying DRAM Issues at the System Level

IEEE Micro
Automated energy/performance macromodeling of embedded software

Proceedings of the 41st annual Design Automation Conference
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies

Proceedings of the 31st annual international symposium on Computer architecture
Pre-Characterization Free, Efficient Power/Performance Analysis of Embedded and General Purpose Software Applications

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
TurboSMARTS: accurate microarchitecture simulation sampling in minutes

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Hybrid simulation for embedded software energy estimation

Proceedings of the 42nd annual Design Automation Conference
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation

The Computer Journal
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation

ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
Accurate and efficient regression modeling for microarchitectural performance and power prediction

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox

IEEE Micro
Active learning for class probability estimation and ranking

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
The shape of the processor design space and its implications for early stage explorations

ACMOS'05 Proceedings of the 7th WSEAS international conference on Automatic control, modeling and simulation
Efficient sampling startup for sampled processor simulation

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers

Accurate and efficient regression modeling for microarchitectural performance and power prediction

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Automatic performance model construction for the fast software exploration of new hardware designs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
A Predictive Performance Model for Superscalar Processors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Methods of inference and learning for performance modeling of parallel applications

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast compiler optimisation evaluation using code-feature based performance prediction

Proceedings of the 4th international conference on Computing frontiers
Automated design of application specific superscalar processors: an analytical approach

Proceedings of the 34th annual international symposium on Computer architecture
Automatic cache tuning for energy-efficiency using local regression modeling

Proceedings of the 44th annual Design Automation Conference
Efficient architectural design space exploration via predictive modeling

ACM Transactions on Architecture and Code Optimization (TACO)
Efficiency trends and limits from comprehensive microarchitectural adaptivity

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A genetic algorithms approach to modeling the performance of memory-bound computations

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Applying neural networks to performance estimation of embedded software

Journal of Systems Architecture: the EUROMICRO Journal
Predictive design space exploration using genetically programmed response surfaces

Proceedings of the 45th annual Design Automation Conference
Efficient system design space exploration using machine learning techniques

Proceedings of the 45th annual Design Automation Conference
Magellan: a search and machine learning-based framework for fast multi-core design space exploration and optimization

Proceedings of the conference on Design, automation and test in Europe
A dollar from 15 cents: cross-platform management for internet services

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Thermal Design Space Exploration of 3D Die Stacked Multi-core Processors Using Geospatial-Based Predictive Models

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
CPR: Composable performance regression for scalable multiprocessor models

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Evaluating the effects of cache redundancy on profit

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

ACM SIGARCH Computer Architecture News
Platform-independent modeling and prediction of application resource usage characteristics

Journal of Systems and Software
Accurate and efficient processor performance prediction via regression tree based modeling

Journal of Systems Architecture: the EUROMICRO Journal
Machine learning-based prefetch optimization for data center applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
ReSPIR: a response surface-based Pareto iterative refinement for application-specific design space exploration

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Accelerating multi-core simulators

Proceedings of the 2010 ACM Symposium on Applied Computing
Architecture performance prediction using evolutionary artificial neural networks

Evo'08 Proceedings of the 2008 conference on Applications of evolutionary computing
Rapid early-stage microarchitecture design using predictive models

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Probabilistic performance modeling of virtualized resource allocation

Proceedings of the 7th international conference on Autonomic computing
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis

Proceedings of the 37th annual international symposium on Computer architecture
Applied inference: Case studies in microarchitectural design

ACM Transactions on Architecture and Code Optimization (TACO)
An integrated framework for joint design space exploration of microarchitecture and circuits

Proceedings of the Conference on Design, Automation and Test in Europe
A Predictive Model for Dynamic Microarchitectural Adaptivity Control

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
ScalaExtrap: trace-based communication extrapolation for spmd programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Multiattribute interaction design: An integrated conceptual design process for modeling interactions and maximizing value

Artificial Intelligence for Engineering Design, Analysis and Manufacturing
ScalaExtrap: Trace-based communication extrapolation for SPMD programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploring and Predicting the Effects of Microarchitectural Parameters and Compiler Optimizations on Performance and Energy

ACM Transactions on Embedded Computing Systems (TECS)
Automatic static feature generation for compiler optimization problems

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
A Variability-Aware Robust Design Space Exploration Methodology for On-Chip Multiprocessors Subject to Application-Specific Constraints

ACM Transactions on Embedded Computing Systems (TECS)
Effective and efficient microprocessor design space exploration using unlabeled design configurations

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Achieving application-centric performance targets via consolidation on multicores: myth or reality?

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
An exploration methodology for a customizable OpenCL stereo-matching application targeted to an industrial multi-cluster architecture

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Model guided adaptive design and analysis in computer experiment

Statistical Analysis and Data Mining
Accurate on-chip router area modeling with kriging methodology

Proceedings of the International Conference on Computer-Aided Design
Microarchitectural design space exploration made fast

Microprocessors & Microsystems
Inferred Models for Dynamic and Sparse Hardware-Software Spaces

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A meta-model assisted coprocessor synthesis framework for compiler/architecture parameters customization

Proceedings of the Conference on Design, Automation and Test in Europe
Improving simulation speed and accuracy for many-core embedded platforms with ensemble models

Proceedings of the Conference on Design, Automation and Test in Europe
Design-space exploration and runtime resource management for multicores

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Wimpy or brawny cores: A throughput perspective

Journal of Parallel and Distributed Computing
Dynamic microarchitectural adaptation using machine learning

ACM Transactions on Architecture and Code Optimization (TACO)
Effective and efficient microprocessor design space exploration using unlabeled design configurations

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
The COMPLEX reference framework for HW/SW co-design and power management supporting platform-based design-space exploration

Microprocessors & Microsystems
A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems

Journal of Parallel and Distributed Computing
Mesoscale performance simulation of multicore processor systems

Software and Systems Modeling (SoSyM)
What to expect when you are consolidating: effective prediction models of application performance on multicores

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simulation intractable. We attack this problem via an automated approach that builds accurate, confident predictive design-space models. We simulate sampled points, using the results to teach our models the function describing relationships among design parameters. The models produce highly accurate performance estimates for other points in the space, can be queried to predict performance impacts of architectural changes, and are very fast compared to simulation, enabling efficient discovery of tradeoffs among parameters in different regions. We validate our approach via sensitivity studies on memory hierarchy and CPU design spaces: our models generally predict IPC with only 1-2% error and reduce required simulation by two orders of magnitude. We also show the efficacy of our technique for exploring chip multiprocessor (CMP) design spaces: when trained on a 1% sample drawn from a CMP design space with 250K points and up to 55x performance swings among different system configurations, our models predict performance with only 4-5% error on average. Our approach combines with techniques to reduce time per simulation, achieving net time savings of three-four orders of magnitude.