Mining sequential patterns for protein fold recognition

Authors:
Themis P. Exarchos;Costas Papaloukas;Christos Lampros;Dimitrios I. Fotiadis
Affiliations:
Department of Medical Physics, Medical School, University of Ioannina, GR 451 10 Ioannina, Greece and Unit of Medical Technology and Intelligent Information Systems, Department of Computer Science ...;Unit of Medical Technology and Intelligent Information Systems, Department of Computer Science, University of Ioannina, P.O. Box 1186, GR 45110 Ioannina, Greece and Department of Biological Applic ...;Department of Medical Physics, Medical School, University of Ioannina, GR 451 10 Ioannina, Greece and Unit of Medical Technology and Intelligent Information Systems, Department of Computer Science ...;Unit of Medical Technology and Intelligent Information Systems, Department of Computer Science, University of Ioannina, P.O. Box 1186, GR 45110 Ioannina, Greece and Biomedical Research Institute - ...
Venue:
Journal of Biomedical Informatics
Year:
2008

Citing 18
Cited 4

Efficient enumeration of frequent sequences

Proceedings of the seventh international conference on Information and knowledge management
Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Scalable Feature Mining for Sequential Data

IEEE Intelligent Systems
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Protein Structure Classification through Structural Fingerprinting

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
Scalable sequential pattern mining for biological sequences

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Classification and knowledge discovery in protein databases

Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Fold Recognition by Predicted Alignment Accuracy

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A simple statistical method for discriminating outer membrane proteins with better accuracy

Bioinformatics
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Calibrating E-values for hidden Markov models using reverse-sequence null models

Bioinformatics
Applying hybrid reasoning to mine for associative features in biological data

Journal of Biomedical Informatics
Sequence-based protein structure prediction using a reduced state-space hidden Markov model

Computers in Biology and Medicine

A two-stage methodology for sequence classification based on sequential pattern mining and optimization

Data & Knowledge Engineering
Mining association language patterns using a distributional semantic model for negative life event classification

Journal of Biomedical Informatics
A hybrid discriminative/generative approach to protein fold recognition

Neurocomputing
Frequent patterns mining in multiple biological sequences

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.