CrossMine: Efficient Classification Across Multiple Database Relations

Authors:
Xiaoxin Yin;Jiawei Han;Jiong Yang;Philip S. Yu
Affiliations:
-;-;-;-
Venue:
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Year:
2004

Citing 13
Cited 37

Rule induction with CN2: some recent improvements

EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Machine Learning

Machine Learning
Database Systems: The Complete Book

Database Systems: The Complete Book
Inductive Logic Programming: Techniques and Applications

Inductive Logic Programming: Techniques and Applications
Discovery of relational association rules

Relational Data Mining
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Learning Probabilistic Models of Relational Structure

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Top-Down Induction of Clustering Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Cross-relational clustering with user's guidance

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining relational databases with multi-view learning

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
An efficient multi-relational Naïve Bayesian classifier based on semantic relationship graph

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
A framework to support multiple query optimization for complex mining tasks

MDM '05 Proceedings of the 6th international workshop on Multimedia data mining: mining integrated media and complex data
Mining relational data through correlation-based multiple view validation

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
An approach to mining the multi-relational imbalanced database

Expert Systems with Applications: An International Journal
Cost-based query optimization for complex pattern mining on multiple databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A Method for Multi-relational Classification Using Single and Multi-feature Aggregation Functions

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Pruning Relations for Substructure Discovery of Multi-relational Databases

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
FARS: A Multi-relational Feature and Relation Selection Approach for Efficient Classification

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Multirelational classification: a multiple view approach

Knowledge and Information Systems
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Scalable mining and link analysis across multiple database relations

ACM SIGKDD Explorations Newsletter
ILP-based concept discovery in multi-relational data mining

Expert Systems with Applications: An International Journal
A multi-relational approach to spatial classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Characteristic relational patterns

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent Itemset Mining in Multirelational Databases

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Exploring the power of heuristics and links in multi-relational data mining

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Mining induced and embedded subtrees in ordered, unordered, and partially-ordered trees

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Concept discovery on relational databases: New techniques for search space pruning and rule quality improvement

Knowledge-Based Systems
POTMiner: mining ordered, unordered, and partially-ordered trees

Knowledge and Information Systems
A general multi-relational classification approach using feature generation and selection

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
A comparative study on ILP-based concept discovery systems

Expert Systems with Applications: An International Journal
Relational mining in spatial domains: accomplishments and challenges

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
CLAP: Collaborative pattern mining for distributed information systems

Decision Support Systems
Boosting tuple propagation in multi-relational classification

Proceedings of the 15th Symposium on International Database Engineering & Applications
Privacy leakage in multi-relational learning via unwanted classification models

Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Efficient classification from multiple heterogeneous databases

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Improving the scalability of ILP-based multi-relational concept discovery system through parallelization

Knowledge-Based Systems
Using trees to mine multirelational databases

Data Mining and Knowledge Discovery
A framework for set-oriented computation in inductive logic programming and its application in generalizing inverse entailment

ILP'05 Proceedings of the 15th international conference on Inductive Logic Programming
Transductive relational classification in the co-training paradigm

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Combining heterogeneous classifiers for relational databases

Pattern Recognition
Simple decision forests for multi-relational classification

Decision Support Systems
Modelling relational statistics with Bayes Nets

Machine Learning
Interestingness measures for association rules within groups

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of today's structured data is stored in relationaldatabases. Such a database consists of multiplerelations which are linked together conceptually viaentity-relationship links in the design of relational databaseschemas. Multi-relational classification can be widelyused in many disciplines, such as financial decision making,medical research, and geographical applications.However, most classification approaches only work on single"flat" data relations. It is usually difficult to convertmultiple relations into a single flat relation without eitherintroducing huge, undesirable "universal relation" orlosing essential information. Previous works using InductiveLogic Programming approaches (recently also knownas Relational Mining) have proven effective with high accuracyin multi-relational classification. Unfortunately,they suffer from poor scalability w.r.t. the number of relationsand the number of attributes in databases.In this paper we propose CrossMine, an efficientand scalable approach for multi-relational classification.Several novel methods are developed in CrossMine,including (1) tuple ID propagation, which performssemantics-preserving virtual join to achieve high efficiencyon databases with complex schemas, and (2) a selectivesampling method, which makes it highly scalablew.r.t. the number of tuples in the databases. Both theoreticalbackgrounds and implementation techniques ofCrossMine are introduced. Our comprehensive experimentson both real and synthetic databases demonstratethe high scalability and accuracy of CrossMine.