CrossMine: efficient classification across multiple database relations

Authors:
Xiaoxin Yin;Jiawei Han;Jiong Yang;Philip S. Yu
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;IBM T.J. Watson Research Center, Yorktown Heights, N.Y.
Venue:
Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Year:
2004

Citing 13
Cited 2

Rule induction with CN2: some recent improvements

EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Machine Learning

Machine Learning
Database Systems: The Complete Book

Database Systems: The Complete Book
Inductive Logic Programming: Techniques and Applications

Inductive Logic Programming: Techniques and Applications
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Scaling Up Inductive Logic Programming by Learning from Interpretations

Data Mining and Knowledge Discovery
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Top-Down Induction of Clustering Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Learning relational probability trees

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving the efficiency of inductive logic programming through the use of query packs

Journal of Artificial Intelligence Research
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Regression on evolving multi-relational data streams

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
A methodology for mining document-enriched heterogeneous information networks

DS'11 Proceedings of the 14th international conference on Discovery science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of today's structured data is stored in relational data- bases. Such a database consists of multiple relations that are linked together conceptually via entity-relationship links in the design of relational database schemas. Multi-relational classification can be widely used in many disciplines including financial decision making and medical research. However, most classification approaches only work on single “flat” data relations. It is usually difficult to convert multiple relations into a single flat relation without either introducing huge “universal relation” or losing essential information. Previous works using Inductive Logic Programming approaches (recently also known as Relational Mining) have proven effective with high accuracy in multi-relational classification. Unfortunately, they fail to achieve high scalability w.r.t. the number of relations in databases because they repeatedly join different relations to search for good literals. In this paper we propose CrossMine, an efficient and scalable approach for multi-relational classification. CrossMine employs tuple ID propagation, a novel method for virtually joining relations, which enables flexible and efficient search among multiple relations. CrossMine also uses aggregated information to provide essential statistics for classification. A selective sampling method is used to achieve high scalability w.r.t. the number of tuples in the databases. Our comprehensive experiments on both real and synthetic databases demonstrate the high scalability and accuracy of CrossMine.