Efficient Classification across Multiple Database Relations: A CrossMine Approach

Authors:
Xiaoxin Yin;Jiawei Han;Jiong Yang;Philip S. Yu
Affiliations:
-;IEEE;-;IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2006

Citing 15
Cited 11

Rule induction with CN2: some recent improvements

EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Machine Learning

Machine Learning
Database Systems: The Complete Book

Database Systems: The Complete Book
Inductive Logic Programming: Techniques and Applications

Inductive Logic Programming: Techniques and Applications
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Scaling Up Inductive Logic Programming by Learning from Interpretations

Data Mining and Knowledge Discovery
Synthesizing High-Frequency Rules from Different Data Sources

IEEE Transactions on Knowledge and Data Engineering
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Top-Down Induction of Clustering Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Identifying Relevant Databases for Multidatabase Mining

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Learning relational probability trees

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving the efficiency of inductive logic programming through the use of query packs

Journal of Artificial Intelligence Research
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

A rank algebra to support multimedia mining applications

Proceedings of the 8th international workshop on Multimedia data mining: (associated with the ACM SIGKDD 2007)
Integrating semantically heterogeneous aggregate views of distributed databases

Distributed and Parallel Databases
Supervised multi-class classification with adaptive and automatic parameter tuning

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Mining knowledge from databases: an information network analysis approach

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Constructing the Bayesian network structure from dependencies implied in multiple relational schemas

Expert Systems with Applications: An International Journal
A framework for relational link discovery

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Data mining from multiple heterogeneous relational databases using decision tree classification

Pattern Recognition Letters
Transforming graph data for statistical relational learning

Journal of Artificial Intelligence Research
Reducing the size of databases for multirelational classification: a subgraph-based approach

Journal of Intelligent Information Systems
Quality of information-based source assessment and selection

Neurocomputing
Genetic algorithm-based optimized association rule mining for multi-relational data

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Relational databases are the most popular repository for structured data, and is thus one of the richest sources of knowledge in the world. In a relational database, multiple relations are linked together via entity-relationship links. Multirelational classification is the procedure of building a classifier based on information stored in multiple relations and making predictions with it. Existing approaches of Inductive Logic Programming (recently, also known as Relational Mining) have proven effective with high accuracy in multirelational classification. Unfortunately, most of them suffer from scalability problems with regard to the number of relations in databases. In this paper, we propose a new approach, called CrossMine, which includes a set of novel and powerful methods for multirelational classification, including 1) tuple ID propagation, an efficient and flexible method for virtually joining relations, which enables convenient search among different relations, 2) new definitions for predicates and decision-tree nodes, which involve aggregated information to provide essential statistics for classification, and 3) a selective sampling method for improving scalability with regard to the number of tuples. Based on these techniques, we propose two scalable and accurate methods for multirelational classification: CrossMine-Rule, a rule-based method and CrossMine-Tree, a decision-tree-based method. Our comprehensive experiments on both real and synthetic data sets demonstrate the high scalability and accuracy of the CrossMine approach.