Force-directed scheduling in automatic data path synthesis
DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
Datapath scheduling for two-level pipelining
DAC '91 Proceedings of the 28th ACM/IEEE Design Automation Conference
An introduction to parallel algorithms
An introduction to parallel algorithms
Parallel arithmetic expression evaluation on reconfigurable meshes
Computer Languages
Module selection for pipelined synthesis
DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Sehwa: a program for synthesis of pipelines
DAC '86 Proceedings of the 23rd ACM/IEEE Design Automation Conference
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Sparse Matrix-Vector multiplication on FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Application Development on the SRC Computers, Inc. Systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Combining module selection and resource sharing for efficient FPGA pipeline synthesis
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
WSEAS Transactions on Circuits and Systems
Parallel processors architecture in FPGA for the solution of linear equations systems
ICOSSSE '09 Proceedings of the 8th WSEAS international conference on System science and simulation in engineering
Hi-index | 0.00 |
Recently, it has become possible to implement floating-point cores on field-programmable gate arrays (FPGAs) to provide acceleration for the myriad applications that require high-performance floating-point arithmetic. To achieve high clock rates, floating-point cores for FPGAs must be deeply pipelined. This deep pipelining makes it difficult to reuse the same floating-point core for a series of dependent computations. However, floating-point cores use a great deal of area, so it is important to use as few of them in an architecture as possible. In this paper, we describe area-efficient architectures and algorithms for arithmetic expression evaluation. Such expression evaluation is necessary in applications from a wide variety of fields, including scientific computing and cognition. The proposed designs effectively hide the pipeline latency of the floating-point cores and use at most two floating-point cores for each type of operator in the expression. While best-suited for particular classes of expressions, the proposed designs can evaluate general expressions as well. Additionally, multiple expressions can be evaluated without reconfiguration. Experimental results show that the areas of our designs increase linearly with the number of types of operations in the expression and that our designs occupy less area and achieve higher throughput than designs generated by a commercial hard-ware compiler.