Designing Patterns and Profiles for Faster HMM Search

Authors:
Yanni Sun;Jeremy Buhler
Affiliations:
Washington University, St. Louis;Washington University, St. Louis
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2009

Citing 4
Cited 2

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
Fast Approximation Algorithms for the Knapsack and Sum of Subset Problems

Journal of the ACM (JACM)
Designing multiple simultaneous seeds for DNA similarity search

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Designing patterns for profile HMM search

Bioinformatics

Designing Filters for Fast-Known NcRNA Identification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multi-parallel prefiltering on the convey HC-1 for supporting homology detection

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

Profile HMMs are powerful tools for modeling conserved motifs in proteins. They are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome. It is highly desirable to speed up HMM search in large databases. We design PROSITE-like patterns and short profiles that are used as filters to rapidly eliminate protein-motif pairs for which a full profile HMM comparison does not yield a significant match. The design of the pattern-based filters is formulated as a multichoice knapsack problem. Profile-based filters with high sensitivity are extracted from a profile HMM based on their theoretical sensitivity and false positive rate. Experiments show that our profile-based filters achieve high sensitivity (near 100 percent) while keeping around 20\times speedup with respect to the unfiltered search program. Pattern-based filters typically retain at least 90 percent of the sensitivity of the source HMM with 30-40\times speedup. The profile-based filters have sensitivity comparable to the multistage filtering strategy HMMERHEAD [15] and are faster in most of our experiments.