Scalable and deterministic timing-driven parallel placement for FPGAs

Authors:
Chris C. Wang;Guy G.F. Lemieux
Affiliations:
University of British Columbia, Vancouver, BC, Canada;University of British Columbia, Vancouver, BC, Canada
Venue:
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Year:
2011

Citing 13
Cited 0

A loosely coupled parallel algorithm for standard cell placement

ICCAD '94 Proceedings of the 1994 IEEE/ACM international conference on Computer-aided design
Fast, contention-free combining tree barriers for shared-memory multiprocessors

International Journal of Parallel Programming
Parallel algorithms for FPGA placement

GLSVLSI '00 Proceedings of the 10th Great Lakes symposium on VLSI
Hardware-assisted simulated annealing with application for fast FPGA placement

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel Simulated Annealing Algorithms for Cell Placement on Hypercube Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
VPR: A new packing, placement and routing tool for FPGA research

FPL '97 Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications
Un/DoPack: re-clustering of large system-on-chip designs with interconnect variation for low-cost FPGAs

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
High-quality, deterministic parallel placement for FPGAs on commodity hardware

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Area and delay trade-offs in the circuit and architecture design of FPGAs

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Enhancing timing-driven FPGA placement for pipelined netlists

Proceedings of the 45th annual Design Automation Conference
VPR 5.0: FPGA cad and architecture exploration tools with single-driver routing, heterogeneity and process scaling

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Towards scalable placement for FPGAs

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Parallelizing Simulated Annealing-Based Placement Using GPGPU

FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a parallel implementation of the timing-driven VPR~5.0 simulated annealing engine. By restricting the move distance to a confined neighborhood, it is possible to consider a large number of non-conflicting moves in parallel and achieve a deterministic result. The full timing-driven algorithm is parallelized, including the detailed timing analysis updates done periodically while placement progresses. The limited move slightly degrades the placement quality, but this is necessary to expose greater degrees of parallelism. The overall bounding box metric degrades about 11% and critical path delay metric degrades about 8% compared to VPR's original algorithm, but we show the amount of degradation is independent of the number of threads. Overall, the parallel implementation scales to a speedup of 123x using 25 threads compared to VPR. With additional tuning effort, we believe the algorithm can be scaled to a larger number of threads, perhaps even run on a GPU, with little additional quality degradation.