Mining Substructures in Protein Data

Authors:
Fedja Hadzic;Tharam S. Dillon;Amandeep S. Sidhu;Elizabeth Chang;Henry Tan
Affiliations:
University of Technology Sydney, Australia;University of Technology Sydney, Australia;University of Technology Sydney, Australia;Curtin University of Technology Perth, Australia;University of Technology Sydney, Australia
Venue:
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Year:
2006

Citing 0
Cited 5

Mining Unordered Distance-Constrained Embedded Subtrees

DS '08 Proceedings of the 11th International Conference on Discovery Science
U3 - Mning Unordered Embedded Subtrees Using TMG Candidate Generation

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Application of tree mining to matching of knowledge structures of decision tree type

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
A statistical interestingness measures for XML based association rules

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Model guided algorithm for mining unordered embedded subtrees

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider the "Prions' database that describes protein instances stored for Human Prion Proteins. The Prions database can be viewed as a database of rooted ordered labeled subtrees. Mining frequent substructures from tree databases is an important task and it has gained a considerable amount of interest in areas such as XML mining, Bioinformatics, Web mining etc. This has given rise to the development of many tree mining algorithms which can aid in structural comparisons, association rule discovery and in general mining of tree structured knowledge representations. Previously we have developed the MB3 tree mining algorithm, which given a minimum support threshold, efficiently discovers all frequent embedded subtrees from a database of rooted ordered labeled subtrees. In this work we apply the algorithm to the Prions database in order to extract the frequently occurring patterns, which in this case are of induced subtree type. Obtaining the set of frequent induced subtrees from the Prions database can potentially reveal some useful knowledge. This aspect will be demonstrated by providing an analysis of the extracted frequent subtrees with respect to discovering interesting protein information. Furthermore, the minimum support threshold can be used as the controlling factor for answering specific queries posed on the Prions dataset. This approach is shown to be a viable technique for mining protein data.