Accelerating statistical static timing analysis using graphics processing units

Authors:
Kanupriya Gulati;Sunil P. Khatri
Affiliations:
Texas A&M University, College Station, TX;Texas A&M University, College Station, TX
Venue:
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Year:
2009

Citing 16
Cited 9

Statistical delay modeling in logic design and synthesis

DAC '94 Proceedings of the 31st annual Design Automation Conference
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator

ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Fast statistical timing analysis by probabilistic event propagation

Proceedings of the 38th annual Design Automation Conference
Principal Component Analysis on Vector Computers

VECPAR '96 Selected papers from the Second International Conference on Vector and Parallel Processing
First-order incremental block-based statistical timing analysis

Proceedings of the 41st annual Design Automation Conference
STAC: statistical timing analysis with correlation

Proceedings of the 41st annual Design Automation Conference
Block-based Static Timing Analysis with Uncertainty

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Statistical Timing Analysis for Intra-Die Process Variations with Spatial Correlations

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Statistical Timing Analysis Using Bounds

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Statistical Timing Analysis with Extended Pseudo-Canonical Timing Model

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
GPGPU: general-purpose computation on graphics hardware

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
GPU architecture overview

ACM SIGGRAPH 2007 courses
Towards acceleration of fault simulation using graphics processing units

Proceedings of the 45th annual Design Automation Conference
Statistical timing analysis using bounds and selective enumeration

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Statistical timing analysis under spatial correlations

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

A fast high quality pseudo random number generator for nVidia CUDA

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Statistical static timing analysis considering leakage variability in power gated designs

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Introduction to GPU programming for EDA

Proceedings of the 2009 International Conference on Computer-Aided Design
Accelerating Monte Carlo based SSTA using FPGA

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Highly parallel decoding of space-time codes on graphics processing units

Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Efficient smart monte carlo based SSTA on graphics processing units with improved resource utilization

Proceedings of the 47th Design Automation Conference
Gate-Level Simulation with GPU Computing

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exploring high throughput computing paradigm for global routing

Proceedings of the International Conference on Computer-Aided Design
A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we explore the implementation of Monte Carlo based statistical static timing analysis (SSTA) on a Graphics Processing Unit (GPU). SSTA via Monte Carlo simulations is a computationally expensive, but important step required to achieve design timing closure. It provides an accurate estimate of delay variations and their impact on design yield. The large number of threads that can be computed in parallel on a GPU suggests a natural fit for the problem of Monte Carlo based SSTA to the GPU platform. Our implementation performs multiple delay simulations at a single gate in parallel. A parallel implementation of the Mersenne Twister pseudo-random number generator on the GPU, followed by Box-Muller transformations (also implemented on the GPU) is used for generating gate delay numbers from a normal distribution. The μ and σ of the pin-to-output delay distributions for all inputs and for every gate, are obtained using a memory lookup, which benefits from the large memory bandwidth of the GPU. Threads which execute in parallel have no data/control dependencies on each other. All threads compute identical instructions, but on different data, as required by the Single Instruction Multiple Data (SIMD) programming semantics of the GPU. Our approach is implemented on a NVIDIA GeForce GTX 8800 GPU card. Our results indicate that our approach can obtain an average speedup of about 260x as compared to a serial CPU implementation. With the recently announced quad 8800 GPU cards, we estimate that our approach would attain a speedup of over 785x. The correctness of the Monte Carlo based SSTA implemented on a GPU has been verified by comparing its results with a CPU based implementation.