Efficient enumeration of frequent sequences
Proceedings of the seventh international conference on Information and knowledge management
Sequence mining in categorical domains: incorporating constraints
Proceedings of the ninth international conference on Information and knowledge management
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Scalable Feature Mining for Sequential Data
IEEE Intelligent Systems
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Sequential PAttern mining using a bitmap representation
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Protein Structure Classification through Structural Fingerprinting
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach
IEEE Transactions on Knowledge and Data Engineering
Scalable sequential pattern mining for biological sequences
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Classification and knowledge discovery in protein databases
Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Fold Recognition by Predicted Alignment Accuracy
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Applying hybrid reasoning to mine for associative features in biological data
Journal of Biomedical Informatics
Sequence-based protein structure prediction using a reduced state-space hidden Markov model
Computers in Biology and Medicine
Data & Knowledge Engineering
Journal of Biomedical Informatics
Frequent patterns mining in multiple biological sequences
Computers in Biology and Medicine
Hi-index | 0.00 |
Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.