Programming with threads
An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors
International Journal of Parallel Programming
Massively Parallel Solutions for Molecular Sequence Analysis
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
High Speed Homology Search Using Run-Time Reconfiguration
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Bio-sequence analysis with cradle's 3SoC™ software scalable system on chip
Proceedings of the 2004 ACM symposium on Applied computing
The UCSC Kestrel Parallel Processor
IEEE Transactions on Parallel and Distributed Systems
An adaptive data prefetching scheme for biosequence database search on reconfigurable platforms
Proceedings of the 2007 ACM symposium on Applied computing
Optimised fine and coarse parallelism for sequence homology search
International Journal of Bioinformatics Research and Applications
Accelerating the viterbi algorithm for profile hidden markov models using reconfigurable hardware
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Hi-index | 0.00 |
Searching on DNA and protein databases using sequence comparison algorithms has become one of the most powerful techniques to better understand the functionality of particular biological sequences. However, the requirements to process the biological data exceed the ability of general-purpose processors. FPGAs (Field Programmable Gate Arrays) connected to server processors have been used to accelerate similarity searches. However, reconfigurable computing platforms have utilized an external I/O bus as the communications channel, limiting the communication speed between the host processor and the FPGA. This communication bottleneck often offsets the application speedup enabled by FPGAs. In this paper we present an adaptive data prefetching scheme to avoid reconfigurable processing coprocessor stalls due to data unavailability through profiling methodologies and quantitative analysis. Experimental results on various query sequences show that the proposed scheme can effectively eliminate a major portion of the data access penalty, increase throughput of the FPGA implementation by up to 42%, and achieve a speedup of 110 for affine gap penalties over a standard PC implementation.