An efficient multi-relational Naïve Bayesian classifier based on semantic relationship graph

Authors:
Hongyan Liu;Xiaoxin Yin;Jiawei Han
Affiliations:
Tsinghua University, Beijing, China;University of Illinois at Urbana-Champaign, Urbana, Illinois;University of Illinois at Urbana-Champaign, Urbana, Illinois
Venue:
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Year:
2005

Citing 11
Cited 5

Enhancements to the data mining process

Enhancements to the data mining process
Database Systems: The Complete Book

Database Systems: The Complete Book
Inductive Logic Programming: Techniques and Applications

Inductive Logic Programming: Techniques and Applications
Propositionalization approaches to relational data mining

Relational Data Mining
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Top-Down Induction of Clustering Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning Probabilistic Relational Models

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Simple Estimators for Relational Bayesian Classifiers

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
CrossMine: Efficient Classification Across Multiple Database Relations

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

An approach to mining the multi-relational imbalanced database

Expert Systems with Applications: An International Journal
FARS: A Multi-relational Feature and Relation Selection Approach for Efficient Classification

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Multi-Relational Classification in Imbalanced Domains

ISICA '08 Proceedings of the 3rd International Symposium on Advances in Computation and Intelligence
Boosting tuple propagation in multi-relational classification

Proceedings of the 15th Symposium on International Database Engineering & Applications
Combining heterogeneous classifiers for relational databases

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into a single one usually causes many problems, development of multi-relational classification algorithms becomes important and attracts many researchers' interests. Existing works about extending Naïve Bayes to deal with multi-relational data either have to transform data stored in tables to main-memory Prolog facts, or limit the search space to only a small subset of real world applications. In this work, we aim at solving these problems and building an efficient, accurate Naïve Bayesian classifier to deal with data in multiple tables directly. We propose an algorithm named Graph-NB, which upgrades Naïve Bayesian classifier to deal with multiple tables directly. In order to take advantage of linkage relationships among tables, and treat different tables linked to the target table differently, a semantic relationship graph is developed to describe the relationship and to avoid unnecessary joins. Furthermore, to improve accuracy, a pruning strategy is given to simplify the graph to avoid examining too many weakly linked tables. Experimental study on both real-world and synthetic databases shows its high efficiency and good accuracy.