C4.5: programs for machine learning
C4.5: programs for machine learning
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Fast discovery of association rules
Advances in knowledge discovery and data mining
Efficient mining of association rules using closed itemset lattices
Information Systems
Data mining: concepts and techniques
Data mining: concepts and techniques
Hidden Markov models in biological sequence analysis
IBM Journal of Research and Development
Using emerging subsequence in classifying protein structural class
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Formal concept mining: a statistic-based approach for pertinent concept lattice construction
ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
Hi-index | 0.00 |
ProsMine is a system for automatically predicting protein structural class from sequence, based on using a combination of data mining techniques. Contrary to our previous protein structural class prediction system, where only enzyme proteins can be predicted, ProsMine can predict the structural class for all of the proteins. We investigate the most effective way to represent protein sequences in our new prediction system. Based on the lattice theory, our idea is to discover the set of Closed Sequences from the protein sequence database and use those appropriate Closed Sequences as protein features. A sequence is said to be "closed" for a given protein sequence database if it is the maximal subsequence of all the protein sequences in that database. Efficient algorithms have been proposed for discovering closed sequences and selecting appropriate closed sequence for each protein structural class. Experimental results, using data extracted from SWISS-PROT and CATH databases, showed that ProsMine yielded better accuracy compared to our previous work even for the most specific level (Homologous-Superfamily Level) of the CATH protein structure hierarchy, which consists of 637 possible classes.