Efficient and Deterministic Parallel Placement for FPGAs

Authors:
Adrian Ludwin;Vaughn Betz
Affiliations:
Altera Corporation;Altera Corporation
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2011

Citing 21
Cited 0

Parallel implementations of the statistical cooling algorithm

Integration, the VLSI Journal
Optimal partitioners and end-case placers for standard-cell layout

ISPD '99 Proceedings of the 1999 international symposium on Physical design
Parallel algorithms for FPGA placement

GLSVLSI '00 Proceedings of the 10th Great Lakes symposium on VLSI
Architecture and CAD for Deep-Submicron FPGAs

Architecture and CAD for Deep-Submicron FPGAs
Multilevel optimization for large-scale circuit placement

Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Hardware-assisted simulated annealing with application for fast FPGA placement

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel placement for field-programmable gate arrays

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel Simulated Annealing Algorithms for Cell Placement on Hypercube Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Parallel Simulated Annealing using Speculative Computation

IEEE Transactions on Parallel and Distributed Systems
ProperPLACE: A Portable Parallel Algorithm for Standard Cell Placement

Proceedings of the 8th International Symposium on Parallel Processing
Parallel simulated annealing strategies for VLSI cell placement

VLSID '96 Proceedings of the 9th International Conference on VLSI Design: VLSI in Mobile Communication
A Parallel Genetic Approach to the Placement Problem for Field Programmable Gate Arrays

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
FastPlace: efficient analytical placement using cell shifting, iterative local refinement and a hybrid net model

Proceedings of the 2004 international symposium on Physical design
A Study of a Transactional Parallel Routing Algorithm

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
High-quality, deterministic parallel placement for FPGAs on commodity hardware

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
VPR 5.0: FPGA cad and architecture exploration tools with single-driver routing, heterogeneity and process scaling

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Improving simulated annealing-based FPGA placement with directed moves

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Towards scalable placement for FPGAs

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
An evaluation of parallel simulated annealing strategies with application to standard cell placement

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A parallel standard cell placement algorithm

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Efficient and effective placement for very large circuits

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a parallel simulated annealing algorithm for FPGA placement. The algorithm proposes and evaluates multiple moves in parallel, and has been incorporated into Altera’s Quartus II CAD system. Across a set of 18 industrial benchmark circuits, we achieve geometric average speedups during the quench of 2.7x and 4.0x on four and eight processors, respectively, with individual circuits achieving speedups of up to 3.6x and 5.9x. Over the course of the entire anneal, we achieve speedups of up to 2.8x and 3.7x, with geometric average speedups of 2.1x and 2.4x. Our algorithm is the first parallel placer to optimize for criteria other than wirelength, such as critical path length, and is one of the few deterministic parallel placement algorithms. We discuss the challenges involved in combining these two features and the new techniques we used to overcome them. We also quantify the impact of maintaining determinism on eight cores, and find that while it reduces performance by approximately 15% relative to an ideal speedup of 8.0x, hardware limitations are a larger factor and reduce performance by 30--40%. We then suggest possible enhancements to allow our approach to scale to 16 cores and beyond.