Inferring decision trees using the minimum description length principle
Information and Computation
Concept formation in structured domains
Concept formation knowledge and experience in unsupervised learning
Machine Discovery of Protein Motifs
Machine Learning - Special issue on applications in molecular biology
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
The role of domain knowledge in substructure discovery
The role of domain knowledge in substructure discovery
Structural knowledge discovery in chemical and spatio-temporal databases
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
IEEE Intelligent Systems
Direct Domain Knowledge Inclusion in the PA3 Rule Induction Algorithm
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Graph-based hierarchical conceptual clustering
The Journal of Machine Learning Research
Discovering knowledge in DNA and protein data
ACM SIGBIO Newsletter - Special issue on biomedical applications of knowledge discovery in databases
Using Evolutionary Algorithms for Defining the Sampling Policy of Complex N-Partite Networks
IEEE Transactions on Knowledge and Data Engineering
Learning patterns in the dynamics of biological networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Hi-index | 0.00 |
The Subdue system evaluates the benefits of using domain knowledge to guide the discovery of repetitive, functional substructures in large structural databases. Results show that domain-specific knowledge improves the search for such substructures and enables greater data compression.The increasing amount and complexity of today's data creates an urgent need to accelerate discovery of knowledge in large databases. In response, designers have developed numerous approaches for discovering concepts in databases using a linear, attribute-value representation. These approaches address issues of data relevance, missing data, noise, and domain knowledge. However, much of the data collected is structural in nature or composed of parts and relations between the parts. Hence, there is a need for scalable tools to analyze and discover concepts in structural databases. Many reported discovery tools are also computationally expensive and cannot scale easily to large databases, especially those containing structural information.Recently, we introduced a method for discovering substructures in structural databases using the minimum description length (MDL) principle. The system, called Subdue, discovers substructures that compress the input database and represent structural concepts. Once Subdue discovers a substructure, the system simplifies the data by replacing instances of the substructure with a pointer to the substructure definition. The discovered substructures allow abstraction over detailed structures in the original data. Iteration of the substructure discovery and replacement process constructs a hierarchical description of the structural data in terms of the discovered substructures. This hierarchy provides varying levels of interpretation that users can access based on the specific goals of the data analysis.In this article, we focus on how to realize the benefits of domain-dependent discovery approaches by adding domain-specific knowledge to a domain-independent discovery system. We also evaluate the benefits and costs of using domain-specific information. In particular, we measure the performance of the Subdue system with and without domain-specific knowledge along the performance dimensions of compression, the time needed to discover the substructures, and the usefulness of the discovered substructures. Finally, we address the issue of scalability of structure discovery using Subdue. On the basis of scalability tests we've conducted, we highlight features of databases that can affect Subdue's performance.