Optimizing substitution matrices by separating score distributions

  • Authors:
  • Yuichiro Hourai;Tatsuya Akutsu;Yutaka Akiyama

  • Affiliations:
  • Department of Computer Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan,;Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan;Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Aomi Frontier Bldg. 17F, 2-43 Aomi, Koto-ku, Tokyo 135-0064, Japan

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation:Homology search is one of the most fundamental tools in Bioinformatics. Typical alignment algorithms use substitution matrices and gap costs. Thus, the improvement of substitution matrices increases accuracy of homology searches. Generally, substitution matrices are derived from aligned sequences whose relationships are known, and gap costs are determined by trial and error. To discriminate relationships more clearly, we are encouraged to optimize the substitution matrices from statistical viewpoints using both positive and negative examples utilizing Bayesian decision theory. Results: Using Cluster of Orthologous Group (COG) database, we optimized substitution matrices. The classification accuracy of the obtained matrix is better than that of conventional substitution matrices to COG database. It also achieves good performance in classifying with other databases. Availability: The optimized substitution matrices and the programs are available from the http://olab.is.s.u-tokyo.ac.jp/~hourai/optssd/index.html