Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
VLSI array processors
High Speed GAML-based Phylogenetic Tree Reconstruction Using HW/SW Codesign
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Embedded Computation of Maximum-Likelihood Phylogeny Inference Using Platform FPGA
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Maxwell - a 64 FPGA Supercomputer
AHS '07 Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems
FPGA Acceleration of Gene Rearrangement Analysis
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Exploring FPGAs for accelerating the phylogenetic likelihood function
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A special-purpose architecture for solving the breakpoint median problem
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A scalable parallel reconfigurable hardware architecture for DNA matching
Integration, the VLSI Journal
Hi-index | 0.00 |
We present in this paper the detailed field-programmable gate-array (FPGA) design of the Maximum Parsimony method for molecular-based phylogenetic analysis and its implementation on the nodes of an FPGA supercomputer called Maxwell. This is the first FPGA implementation of this method for nucleotide sequence data reported in the literature. The hardware architecture consists in a linear systolic array composed of 20 processing elements each of which performing Sankoff's algorithm for a different tree topology in parallel. This array computes the scores of all theoretically possible trees for a given number of taxa in several iterations. The currently supported maximum number of taxa is 12 but this number can be easily increased. Furthermore, the resulting implementation outperforms an equivalent desktop-based software implementation (using phylogenetic analysis using parsimony software) by several orders of magnitude. The speed-up values achieved by the hardware implementation on a single node of the Maxwell machine can reach up to four orders of magnitude for the 12-taxa case while implementations on several Maxwell nodes can yield even higher speed-ups. This is achieved through harnessing both coarse-grain and fine-grain parallelism available in the algorithm and corresponding hardware implementation platform.