The multiple sequence alignment problem in biology
SIAM Journal on Applied Mathematics
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
ACM Computing Surveys (CSUR)
A sub-quadratic sequence alignment algorithm for unrestricted cost matrices
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Clustered trie structures for approximate search in hierarchical objects collections
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Hi-index | 0.00 |
In this paper we present a new Multiple SequenceAlignment (MSA) algorithm called AntiClusAl.Themethod makes use of the commonly use idea ofaligning homologous sequences belonging to classesgenerated by some clustering algorithm, and thencontinue the alignment process ina bottom-up wayalong a suitable tree structure.The final result isthen read at the root of the tree.Multiple sequencealignment in each cluster makes use of the progressivealignment with the 1-median (center) of the cluster.The 1-median of set S of sequences is the elementof S which minimizes the average distance from anyother sequence in S.Its exact computation requiresquadratic time.The basic idea of our proposedalgorithm is to make use of a simple and natural algorithmictechnique based on randomized tournamentswhich has been successfully applied to large size searchproblems in general metric spaces.In particular aclustering algorithm called Antipole tree and an approximatelinear 1-median computation are used.Ouralgorithm compared with Clustal W, a widely used toolto MSA, shows a better running time results with fullycomparable alignment quality.A successful biologicalapplication showing high aminoacid conservationduring evolution of Xenopus laevis SOD2 is also cited.