Computational and statistical methods in bioinformatics
AM'03 Proceedings of the Second international conference on Active Mining
Hi-index | 3.84 |
Motivation:Homology search is one of the most fundamental tools in Bioinformatics. Typical alignment algorithms use substitution matrices and gap costs. Thus, the improvement of substitution matrices increases accuracy of homology searches. Generally, substitution matrices are derived from aligned sequences whose relationships are known, and gap costs are determined by trial and error. To discriminate relationships more clearly, we are encouraged to optimize the substitution matrices from statistical viewpoints using both positive and negative examples utilizing Bayesian decision theory. Results: Using Cluster of Orthologous Group (COG) database, we optimized substitution matrices. The classification accuracy of the obtained matrix is better than that of conventional substitution matrices to COG database. It also achieves good performance in classifying with other databases. Availability: The optimized substitution matrices and the programs are available from the http://olab.is.s.u-tokyo.ac.jp/~hourai/optssd/index.html