Parallel implementations of the statistical cooling algorithm
Integration, the VLSI Journal
A loosely coupled parallel algorithm for standard cell placement
ICCAD '94 Proceedings of the 1994 IEEE/ACM international conference on Computer-aided design
Parallel algorithms for FPGA placement
GLSVLSI '00 Proceedings of the 10th Great Lakes symposium on VLSI
Hardware-assisted simulated annealing with application for fast FPGA placement
FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel placement for field-programmable gate arrays
FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Parallel Simulated Annealing Algorithms for Cell Placement on Hypercube Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Parallel Simulated Annealing using Speculative Computation
IEEE Transactions on Parallel and Distributed Systems
ProperPLACE: A Portable Parallel Algorithm for Standard Cell Placement
Proceedings of the 8th International Symposium on Parallel Processing
VPR: A new packing, placement and routing tool for FPGA research
FPL '97 Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications
Parallel simulated annealing strategies for VLSI cell placement
VLSID '96 Proceedings of the 9th International Conference on VLSI Design: VLSI in Mobile Communication
An evaluation of parallel simulated annealing strategies with application to standard cell placement
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Improving simulated annealing-based FPGA placement with directed moves
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Parallel multi-level analytical global placement on graphics processing units
Proceedings of the 2009 International Conference on Computer-Aided Design
Towards scalable placement for FPGAs
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Speeding up FPGA placement via partitioning and multithreading
International Journal of Reconfigurable Computing
Line-level incremental resynthesis techniques for FPGAs
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Towards scalable FPGA CAD through architecture
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Scalable and deterministic timing-driven parallel placement for FPGAs
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Parallel cross-layer optimization of high-level synthesis and physical design
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Efficient and Deterministic Parallel Placement for FPGAs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
StarPlace: A new analytic method for FPGA placement
Integration, the VLSI Journal
A fast discrete placement algorithm for FPGAs
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Analyzing System-Level Information’s Correlation to FPGA Placement
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
High-Level Abstractions and Modular Debugging for FPGA Design Validation
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.00 |
In this paper, we describe the application of two parallelization strategies to the Quartus II FPGA placer. The first uses a pipelining approach and achieves speedups of 1.3x on two processing cores. The second uses a parallel moves approach and achieves speedups of 2.2x on four cores. Unlike all previous parallel moves algorithms, ours is deterministic and always gives the same answer as the serial version of the algorithm, without any significant reduction in performance. We also describe a process to quantify multi-core performance effects, such as memory subsystem limitations and explicit synchronization overhead, and fully describe these effects on a CAD tool for the first time. Memory limitations alone are found to cost up to 35% of total runtime. Unlike previous algorithms, our algorithms have negligible explicit synchronization overhead. These results are relevant to both CAD designers and to any developers seeking to parallelize existing software.