Parallel implementations of the statistical cooling algorithm
Integration, the VLSI Journal
Optimal partitioners and end-case placers for standard-cell layout
ISPD '99 Proceedings of the 1999 international symposium on Physical design
Parallel algorithms for FPGA placement
GLSVLSI '00 Proceedings of the 10th Great Lakes symposium on VLSI
Architecture and CAD for Deep-Submicron FPGAs
Architecture and CAD for Deep-Submicron FPGAs
Multilevel optimization for large-scale circuit placement
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Hardware-assisted simulated annealing with application for fast FPGA placement
FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel placement for field-programmable gate arrays
FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel Simulated Annealing Algorithms for Cell Placement on Hypercube Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Parallel Simulated Annealing using Speculative Computation
IEEE Transactions on Parallel and Distributed Systems
ProperPLACE: A Portable Parallel Algorithm for Standard Cell Placement
Proceedings of the 8th International Symposium on Parallel Processing
Parallel simulated annealing strategies for VLSI cell placement
VLSID '96 Proceedings of the 9th International Conference on VLSI Design: VLSI in Mobile Communication
A Parallel Genetic Approach to the Placement Problem for Field Programmable Gate Arrays
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Proceedings of the 2004 international symposium on Physical design
A Study of a Transactional Parallel Routing Algorithm
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
High-quality, deterministic parallel placement for FPGAs on commodity hardware
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Improving simulated annealing-based FPGA placement with directed moves
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Towards scalable placement for FPGAs
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
An evaluation of parallel simulated annealing strategies with application to standard cell placement
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A parallel standard cell placement algorithm
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Efficient and effective placement for very large circuits
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
We describe a parallel simulated annealing algorithm for FPGA placement. The algorithm proposes and evaluates multiple moves in parallel, and has been incorporated into Altera’s Quartus II CAD system. Across a set of 18 industrial benchmark circuits, we achieve geometric average speedups during the quench of 2.7x and 4.0x on four and eight processors, respectively, with individual circuits achieving speedups of up to 3.6x and 5.9x. Over the course of the entire anneal, we achieve speedups of up to 2.8x and 3.7x, with geometric average speedups of 2.1x and 2.4x. Our algorithm is the first parallel placer to optimize for criteria other than wirelength, such as critical path length, and is one of the few deterministic parallel placement algorithms. We discuss the challenges involved in combining these two features and the new techniques we used to overcome them. We also quantify the impact of maintaining determinism on eight cores, and find that while it reduces performance by approximately 15% relative to an ideal speedup of 8.0x, hardware limitations are a larger factor and reduce performance by 30--40%. We then suggest possible enhancements to allow our approach to scale to 16 cores and beyond.