A Computational Approach to Edge Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
A survey of image registration techniques
ACM Computing Surveys (CSUR)
Wavelet-based image registration on parallel computers
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Image Processing Algorithms on Reconfigurable Architecture using HandelC
DSD '04 Proceedings of the Digital System Design, EUROMICRO Systems
Using Design Patterns to Overcome Image Processing Constraints on FPGAs
DELTA '06 Proceedings of the Third IEEE International Workshop on Electronic Design, Test and Applications
FPGA-based configurable systolic architecture for window-based image processing
EURASIP Journal on Applied Signal Processing
Fpga-based face detection system using Haar classifiers
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
Reconfigurable Computers (RCs) with hardware (FPGA) co-processors can achieve significant performance improvement compared with traditional microprocessor (µP)-based computers for many scientific applications. The potential amount of speedup depends on the intrinsic parallelism of the target application as well as the characteristics of the target platform. In this work, we use image processing applications as a case study to demonstrate how hardware designs are parameterized by the co-processor architecture, particularly the data I/O, i.e., the local memory of the FPGA device and the interconnect between the FPGA and the µP. The local memory has to be used by applications that access data randomly. A typical case belonging to this category is image registration. On the other hand, an application such as edge detection can directly read data through the interconnect in a sequential fashion. Two different algorithms of image registration, the exhaustive search algorithm and the Discrete Wavelet Transform (DWT)-based search algorithm, are implemented on hardware, i.e., Xilinx Vertex-IIPro 50 on the Cray XD1 reconfigurable computer. The performance improvements of hardware implementations are 10× and 2×, respectively. Regarding the category of applications that directly access the interconnect, the hardware implementation of Canny edge detection can achieve 544× speedup.