High-quality, deterministic parallel placement for FPGAs on commodity hardware

Authors:
Adrian Ludwin;Vaughn Betz;Ketan Padalia
Affiliations:
Altera Corporation, Toronto, ON, Canada;Altera Corporation, Toronto, ON, Canada;Altera Corporation, Toronto, ON, Canada
Venue:
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Year:
2008

Citing 11
Cited 13

Parallel implementations of the statistical cooling algorithm

Integration, the VLSI Journal
A loosely coupled parallel algorithm for standard cell placement

ICCAD '94 Proceedings of the 1994 IEEE/ACM international conference on Computer-aided design
Parallel algorithms for FPGA placement

GLSVLSI '00 Proceedings of the 10th Great Lakes symposium on VLSI
Hardware-assisted simulated annealing with application for fast FPGA placement

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel placement for field-programmable gate arrays

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel Simulated Annealing Algorithms for Cell Placement on Hypercube Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Parallel Simulated Annealing using Speculative Computation

IEEE Transactions on Parallel and Distributed Systems
ProperPLACE: A Portable Parallel Algorithm for Standard Cell Placement

Proceedings of the 8th International Symposium on Parallel Processing
VPR: A new packing, placement and routing tool for FPGA research

FPL '97 Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications
Parallel simulated annealing strategies for VLSI cell placement

VLSID '96 Proceedings of the 9th International Conference on VLSI Design: VLSI in Mobile Communication
An evaluation of parallel simulated annealing strategies with application to standard cell placement

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Improving simulated annealing-based FPGA placement with directed moves

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Parallel multi-level analytical global placement on graphics processing units

Proceedings of the 2009 International Conference on Computer-Aided Design
Towards scalable placement for FPGAs

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Speeding up FPGA placement via partitioning and multithreading

International Journal of Reconfigurable Computing
Line-level incremental resynthesis techniques for FPGAs

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Towards scalable FPGA CAD through architecture

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Scalable and deterministic timing-driven parallel placement for FPGAs

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Parallel cross-layer optimization of high-level synthesis and physical design

Proceedings of the 16th Asia and South Pacific Design Automation Conference
Efficient and Deterministic Parallel Placement for FPGAs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
StarPlace: A new analytic method for FPGA placement

Integration, the VLSI Journal
A fast discrete placement algorithm for FPGAs

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Analyzing System-Level Information’s Correlation to FPGA Placement

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
High-Level Abstractions and Modular Debugging for FPGA Design Validation

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe the application of two parallelization strategies to the Quartus II FPGA placer. The first uses a pipelining approach and achieves speedups of 1.3x on two processing cores. The second uses a parallel moves approach and achieves speedups of 2.2x on four cores. Unlike all previous parallel moves algorithms, ours is deterministic and always gives the same answer as the serial version of the algorithm, without any significant reduction in performance. We also describe a process to quantify multi-core performance effects, such as memory subsystem limitations and explicit synchronization overhead, and fully describe these effects on a CAD tool for the first time. Memory limitations alone are found to cost up to 35% of total runtime. Unlike previous algorithms, our algorithms have negligible explicit synchronization overhead. These results are relevant to both CAD designers and to any developers seeking to parallelize existing software.