FPGA-Based Hardware Acceleration of Lithographic Aerial Image Simulation

Authors:
Jason Cong;Yi Zou
Affiliations:
University of California, Los Angeles;University of California, Los Angeles
Venue:
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Year:
2009

Citing 4
Cited 7

Fast optical and process proximity correction algorithms for integrated circuit manufacturing

Fast optical and process proximity correction algorithms for integrated circuit manufacturing
A novel intensity based optical proximity correction algorithm with speedup in lithography simulation

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Lithographic aerial image simulation with FPGA-based hardwareacceleration

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Byte and modulo addressable parallel memory architecture for video coding

IEEE Transactions on Circuits and Systems for Video Technology

LegUp: high-level synthesis for FPGA-based processor/accelerator systems

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Accelerating aerial image simulation with GPU

Proceedings of the International Conference on Computer-Aided Design
Architecture support for accelerator-rich CMPs

Proceedings of the 49th Annual Design Automation Conference
LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
From software to accelerators with LegUp high-level synthesis

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Efficient aerial image simulation on multi-core SIMD CPU

Proceedings of the International Conference on Computer-Aided Design
From design to design automation

Proceedings of the 2014 on International symposium on physical design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lithography simulation, an essential step in design for manufacturability (DFM), is still far from computationally efficient. Most leading companies use large clusters of server computers to achieve acceptable turn-around time. Thus coprocessor acceleration is very attractive for obtaining increased computational performance with a reduced power consumption. This article describes the implementation of a customized accelerator on FPGA using a polygon-based simulation model. An application-specific memory partitioning scheme is designed to meet the bandwidth requirements for a large number of processing elements. Deep loop pipelining and ping-pong buffer based function block pipelining are also implemented in our design. Initial results show a 15X speedup versus the software implementation running on a microprocessor, and more speedup is expected via further performance tuning. The implementation also leverages state-of-art C-to-RTL synthesis tools. At the same time, we also identify the need for manual architecture-level exploration for parallel implementations. Moreover, we implement the algorithm on NVIDIA GPUs using the CUDA programming environment, and provide some useful comparisons for different kinds of accelerators.