Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Comparing graph-based representations of protein for mining purposes
Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
Boosting with structure information in the functional space: an application to graph classification
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
One of the used techniques to address protein structure investigation is to look for recurrent fragments (also called substructures or spatial motifs), then use these spatial motifs as patterns to characterize the proteins under consideration. An emergent trend consists on parsing proteins into graphs of amino acids. Hence, the search of recurrent spatial motifs is formulated as a process of frequent subgraph discovery where each subgraph represent a spatial motif. In this scope, several efficient approaches for frequent subgraph discovery have been proposed in literature. However, the set of discovered frequent subgraphs is too large to be efficiently analyzed and explored in any knowledge discovery process. In this paper, we propose a novel approach that shrinks the large size of the set of discovered subgraph-motifs by selecting the representative ones. Our method selects representative subgraph-motifs based on the evolutionary information of amino acids defined in the substitution matrices. The results issued from our experiments show that our approach decreases dramatically the number of motifs while enhancing their interestingness.