A time-efficient, linear-space local similarity algorithm
Advances in Applied Mathematics
Rapid significance estimation in local sequence alignment with gaps
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Rapid Assessment of Extremal Statistics for Gapped Local Alignment
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
FPGA architecture for pairwise statistical significance estimation
International Journal of High Performance Systems Architecture
Hi-index | 0.00 |
A central question in pairwise sequence comparison is assessingthe statistical significance of the alignment. The alignment scoredistribution is known to follow an extreme value distribution with analyticallycalculable parameters K and λ for ungapped alignments withone substitution matrix. But no statistical theory is currently availablefor the gapped case and for alignments using multiple scoring matrices,although their score distribution is known to closely follow extremevalue distribution and the corresponding parameters can be estimated bysimulation. Ideal estimation would require simulation for each sequencepair, which is impractical. In this paper, we present a simple clusteringclassificationapproach based on amino acid composition to estimate Kand λ for a given sequence pair and scoring scheme, including using multipleparameter sets. The resulting set of K and λ for different clusterpairs has large variability even for the same scoring scheme, underscoringthe heavy dependence of K and λ on the amino acid composition. Theproposed approach in this paper is an attempt to separate the influenceof amino acid composition in estimation of statistical significance of pairwiseprotein alignments. Experiments and analysis of other approachesto estimate statistical parameters also indicate that the methods used inthis work estimate the statistical significance with good accuracy.