Towards a comprehensive collection of diagnostic patterns for protein sequence classification

Authors:
Björn Olsson;Kim Laurio
Affiliations:
Department of Computer Science, University of Skövde, Box 408, 541 28 Skövde, Sweden;Department of Computer Science, University of Skövde, Box 408, 541 28 Skövde, Sweden
Venue:
Information Sciences—Informatics and Computer Science: An International Journal
Year:
2002

Citing 1
Cited 1

Discovery of Diagnostic Patterns from Protein Sequence Databases

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery

Brief communication: Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms

Computational Biology and Chemistry

Quantified Score

Hi-index	0.00

Visualization

Abstract

The PROSITE collection of patterns for family classification of protein sequences requires much manual labour for motif finding and pattern updating, and yet has only moderate classification accuracy. Out of 1026 families with patterns in PROSITE release 16.0, there was only 523 (51%) with a diagnostic pattern, i.e., a pattern which discriminates perfectly between family and non-family sequences in the training set. Therefore, there is a need to find reliable methods for automating the processes of motiffinding and pattern construction, so that improved speed can be combined with greater classification accuracy.In this paper we present our approach to automating the construction of a collection of patterns, and we announce release 1.0 of the pattern collection built by motif-finding by analysis of multiple alignments (MAMA). MAMA is found to improve the classification accuracy over PROSITE by finding many more diagnostic patterns. On 926 tested families, MAMA finds such patterns for 771 (83%). Furthermore, both the average specificity and sensitivity of MAMA patterns are found to be higher than for PROSITE.A WWW interface that allows users to submit sequences and scan for matches in the MAMA pattern collection is available, together with a listing of all the patterns in MAMA release 1.0.