Statistical delay modeling in logic design and synthesis
DAC '94 Proceedings of the 31st annual Design Automation Conference
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Fast statistical timing analysis by probabilistic event propagation
Proceedings of the 38th annual Design Automation Conference
Principal Component Analysis on Vector Computers
VECPAR '96 Selected papers from the Second International Conference on Vector and Parallel Processing
First-order incremental block-based statistical timing analysis
Proceedings of the 41st annual Design Automation Conference
STAC: statistical timing analysis with correlation
Proceedings of the 41st annual Design Automation Conference
Block-based Static Timing Analysis with Uncertainty
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Statistical Timing Analysis for Intra-Die Process Variations with Spatial Correlations
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Statistical Timing Analysis Using Bounds
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Statistical Timing Analysis with Extended Pseudo-Canonical Timing Model
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
GPU Cluster for High Performance Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
GPGPU: general-purpose computation on graphics hardware
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ACM SIGGRAPH 2007 courses
Towards acceleration of fault simulation using graphics processing units
Proceedings of the 45th annual Design Automation Conference
Statistical timing analysis using bounds and selective enumeration
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Statistical timing analysis under spatial correlations
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A fast high quality pseudo random number generator for nVidia CUDA
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Statistical static timing analysis considering leakage variability in power gated designs
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Introduction to GPU programming for EDA
Proceedings of the 2009 International Conference on Computer-Aided Design
Accelerating Monte Carlo based SSTA using FPGA
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Highly parallel decoding of space-time codes on graphics processing units
Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Proceedings of the 47th Design Automation Conference
Gate-Level Simulation with GPU Computing
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exploring high throughput computing paradigm for global routing
Proceedings of the International Conference on Computer-Aided Design
A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hi-index | 0.00 |
In this paper, we explore the implementation of Monte Carlo based statistical static timing analysis (SSTA) on a Graphics Processing Unit (GPU). SSTA via Monte Carlo simulations is a computationally expensive, but important step required to achieve design timing closure. It provides an accurate estimate of delay variations and their impact on design yield. The large number of threads that can be computed in parallel on a GPU suggests a natural fit for the problem of Monte Carlo based SSTA to the GPU platform. Our implementation performs multiple delay simulations at a single gate in parallel. A parallel implementation of the Mersenne Twister pseudo-random number generator on the GPU, followed by Box-Muller transformations (also implemented on the GPU) is used for generating gate delay numbers from a normal distribution. The μ and σ of the pin-to-output delay distributions for all inputs and for every gate, are obtained using a memory lookup, which benefits from the large memory bandwidth of the GPU. Threads which execute in parallel have no data/control dependencies on each other. All threads compute identical instructions, but on different data, as required by the Single Instruction Multiple Data (SIMD) programming semantics of the GPU. Our approach is implemented on a NVIDIA GeForce GTX 8800 GPU card. Our results indicate that our approach can obtain an average speedup of about 260x as compared to a serial CPU implementation. With the recently announced quad 8800 GPU cards, we estimate that our approach would attain a speedup of over 785x. The correctness of the Monte Carlo based SSTA implemented on a GPU has been verified by comparing its results with a CPU based implementation.