Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices

Authors:
Ankit Agrawal;Xiaoqiu Huang
Affiliations:
Iowa State University, Ames;Iowa State University, Ames
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2011

Citing 7
Cited 3

A linear space algorithm for computing maximal common subsequences

Communications of the ACM
Rapid significance estimation in local sequence alignment with gaps

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Rapid Assessment of Extremal Statistics for Gapped Local Alignment

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions

Bioinformatics
Convergent Island Statistics: a fast method for determining local alignment score significance

Bioinformatics
Conservative, Non-conservative and Average Pairwise Statistical Significance of Local Sequence Alignment

BIBM '08 Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine
Pairwise Statistical Significance of Local Sequence Alignment Using Substitution Matrices with Sequence-Pair-Specific Distance

ICIT '08 Proceedings of the 2008 International Conference on Information Technology

MPIPairwiseStatSig: parallel pairwise statistical significance estimation of local sequence alignment

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Derived distribution points heuristic for fast pairwise statistical significance estimation

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
FPGA architecture for pairwise statistical significance estimation

International Journal of High Performance Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pairwise sequence alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-independent. In this paper, we use sequence-specific and position-specific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequence-specific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequence-specific substitution matrices at different levels of sequence-specific contribution were conducted, and results confirm that using sequence-specific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSI-BLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSI-BLAST results are significantly better. Further, using position-specific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSI-BLAST using pretrained PSSMs.