Pruning Relations for Substructure Discovery of Multi-relational Databases

Authors:
Hongyu Guo;Herna L. Viktor;Eric Paquet
Affiliations:
School of Information Technology& Engineering, University of Ottawa, Canada;School of Information Technology& Engineering, University of Ottawa, Canada;National Research Council of Canada, Ottawa, Canada
Venue:
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2007

Citing 8
Cited 2

Numerical recipes in C: the art of scientific computing

Numerical recipes in C: the art of scientific computing
C4.5: programs for machine learning

C4.5: programs for machine learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
CrossMine: Efficient Classification Across Multiple Database Relations

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Pruning Social Networks Using Structural Properties and Descriptive Attributes

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mining relational data through correlation-based multiple view validation

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Top-down induction of first-order logical decision trees

Artificial Intelligence
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences

Privacy leakage in multi-relational learning via unwanted classification models

Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Reducing the size of databases for multirelational classification: a subgraph-based approach

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multirelational data mining methods discover patterns across multiple interlinked tables (relations) in a relational database. In many large organizations, such a multi-relational database spans numerous departments and/or subdivisions, which are involved in different aspects of the enterprise such as customer profiling, fraud detection, inventory management, financial management, and so on. When considering multirelational classification, it follows that these subdivisions will express different interests in the data, leading to the need to explore various subsets of relevant relations with high utility with respect to the target class. The paper presents a novel approach for pruning the uninteresting relations of a relational database where relations come from such different parties and spans many classification tasks. We aim to create a pruned structure and thus minimize predictive performance loss on the final classification model. Our method identifies a set of strongly uncorrelated subgraphs to use for training and discards all others. The experiments performed demonstrate that our strategy is able to significantly reduce the size of the relational schema without sacrificing predictive accuracy.