Efficient smart monte carlo based SSTA on graphics processing units with improved resource utilization

Authors:
Vineeth Veetil;Yung-Hsu Chang;Dennis Sylvester;David Blaauw
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
Proceedings of the 47th Design Automation Conference
Year:
2010

Citing 15
Cited 0

Simulation and the Monte Carlo Method

Simulation and the Monte Carlo Method
First-order incremental block-based statistical timing analysis

Proceedings of the 41st annual Design Automation Conference
Statistical timing analysis based on a timing yield model

Proceedings of the 41st annual Design Automation Conference
Statistical Timing Analysis Considering Spatial Correlations using a Single Pert-Like Traversal

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Parametric yield maximization using gate sizing based on efficient statistical power and delay gradient computation

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Criticality computation in parameterized statistical timing

Proceedings of the 43rd annual Design Automation Conference
Mixture importance sampling and its application to the analysis of SRAM designs in the presence of rare failure events

Proceedings of the 43rd annual Design Automation Conference
From Finance to Flip Flops: A Study of Fast Quasi-Monte Carlo Methods from Computational Finance Applied to Statistical Circuit Analysis

ISQED '07 Proceedings of the 8th International Symposium on Quality Electronic Design
Efficient Monte Carlo based incremental statistical timing analysis

Proceedings of the 45th annual Design Automation Conference
Modeling crosstalk in statistical static timing analysis

Proceedings of the 45th annual Design Automation Conference
Exploiting correlation kernels for efficient handling of intra-die spatial correlation, with application to statistical timing

Proceedings of the conference on Design, automation and test in Europe
Practical, fast Monte Carlo statistical static timing analysis: why and how

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
On efficient Monte Carlo-based statistical static timing analysis of digital circuits

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Accelerating statistical static timing analysis using graphics processing units

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Efficient smart sampling based full-chip leakage analysis for intra-die variation considering state dependence

Proceedings of the 46th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

To exploit the benefits of throughput-optimized processors such as GPUs, applications need to be redesigned to achieve performance and efficiency. In this work, we present techniques to speed up statistical timing analysis on throughput processors. We draw upon advancements in improving the efficiency of Monte Carlo based statistical static timing analysis (MC SSTA) using techniques to reduce the sample size or smart sampling techniques. An efficient smart sampling technique, Stratification + Hybrid Quasi Monte Carlo (SH-QMC), is implemented on a GPU based on NVIDIA CUDA architecture. We show that although this application is based on MC analysis with straightforward parallelism available, achieving performance and efficiency on the GPU requires exposing more parallelism and finding locality in computations. This is in contrast with random sampling based algorithms which are inefficient in terms of sample size but can keep resources utilized on a GPU. We show that SH-QMC implemented on a Multi GPU is twice as fast as a single STA on a CPU for benchmark circuits considered. In terms of an efficiency metric, which measures the ability to convert a reduction in sample size to a corresponding reduction in runtime w.r.t a random sampling approach, we achieve 73.9% efficiency with the proposed approaches compared to 4.3% for an implementation involving performing computations on smart samples in parallel. Another contribution of the paper is a critical graph analysis technique to improve the efficiency of Monte Carlo based SSTA, leading to 2--9X further speedup.