A maximum-likelihood formulation and EM algorithm for the protein multiple alignment problem

Authors:
Valentina Sulimova;Nikolay Razin;Vadim Mottl;Ilya Muchnik;Casimir Kulikowski
Affiliations:
Tula State University, Tula, Russia;MIPT, Moscow, Russia;Computing Center of the RAS, Moscow, Russia;Rutgers University, DIMACS, New Brunswick, NJ;Rutgers University, Department of Computer Science, New Brunswick, NJ
Venue:
PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Year:
2010

Citing 4
Cited 0

The multiple sequence alignment problem in biology

SIAM Journal on Applied Mathematics
Trees, stars, and multiple biological sequence alignment

SIAM Journal on Applied Mathematics
PROMALS

Bioinformatics
A Class of Evolution-Based Kernels for Protein Homology Analysis: A Generalization of the PAM Model

ISBRA '09 Proceedings of the 5th International Symposium on Bioinformatics Research and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

A given group of protein sequences of different lengths is considered as resulting from random transformations of independent random ancestor sequences of the same preset smaller length, each produced in accordance with an unknown common probabilistic profile. We describe the process of transformation by a Hidden Markov Model (HMM) which is a direct generalization of the PAM model for amino acids. We formulate the problem of finding the maximum likelihood probabilistic ancestor profile and demonstrate its practicality. The proposed method of solving this problem allows for obtaining simultaneously the ancestor profile and the posterior distribution of its HMM, which permits efficient determination of the most probable multiple alignment of all the sequences. Results obtained on the BAliBASE 3.0 protein alignment benchmark indicate that the proposed method is generally more accurate than popular methods of multiple alignment such as CLUSTALW, DIALIGN and ProbAlign.