BioSCAN: A VLSI-Based System for Biosequence Analysis
ICCD '91 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
Kestrel: A Programmable Array for Sequence Analysis
ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
A Systolic Array for the Sequence Alignment Problem
A Systolic Array for the Sequence Alignment Problem
Hyper customized processors for bio-sequence database scanning on FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Computer
CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
HLS tools for FPGA: faster development with better performance
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Efficient architecture and scheduling technique for pairwise sequence alignment
ACM SIGARCH Computer Architecture News
FPGA-based HPC application design for non-experts (abstract only)
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
On Combining Sequence Alignment and Feature-Quantization for Sub-Image Searching
International Journal of Multimedia Data Engineering & Management
FPGA architecture for pairwise statistical significance estimation
International Journal of High Performance Systems Architecture
C2FPGA-A dependency-timing graph design methodology
Journal of Parallel and Distributed Computing
Fine-grained parallel implementations for SWAMP+ Smith-Waterman alignment
Parallel Computing
Hi-index | 0.00 |
An innovative reconfigurable supercomputing platform -- XD1000 is developed by XtremeData Inc. to exploit the rapid progress of FPGA technology and the high-performance of Hyper-Transport interconnection. In this paper, we present the implementations of the Smith-Waterman algorithm for both DNA and protein sequences on the platform. The main features include: (1) we bring forward a multistage PE (processing element) design which significantly reduces the FPGA resource usage and hence allows more parallelism to be exploited; (2) our design features a pipelined control mechanism with uneven stage latencies -- a key to minimize the overall PE pipeline cycle time; (3) we also put forward a compressed substitution matrix storage structure, resulting in substantial decrease of the on-chip SRAM usage. Finally, we implement a 384-PE systolic array running at 66.7MHz, which can achieve 25.6GCUPS peak performance. Compared with the 2.2GHz AMD Opteron host processor, the FPGA coprocessor speedups 185X and 250X respectively.