Learning from Skewed Class Multi-relational Databases

Authors:
Hongyu Guo;Herna L. Viktor
Affiliations:
(Correspd.) School of Information Technology and Engineering, University of Ottawa, Canada. hguo028@site.uottawa.ca, hlviktor@site.uottawa.ca;School of Information Technology and Engineering, University of Ottawa, Canada. hguo028@site.uottawa.ca, hlviktor@site.uottawa.ca
Venue:
Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Year:
2009

Citing 35
Cited 0

Fundamentals of database systems

Fundamentals of database systems
C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Database management systems

Database management systems
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Top-down induction of first-order logical decision trees

Artificial Intelligence
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Category learning through multimodality sensing

Neural Computation
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Robust Classification for Imprecise Environments

Machine Learning
Random Forests

Machine Learning
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Learning Probabilistic Models of Relational Structure

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Adaptive View Validation: A First Step Towards Automatic View Detection

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Combining Labeled and Unlabeled Data for MultiClass Text Categorization

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Improving Minority Class Prediction Using Case-Specific Feature Weights

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Email classification with co-training

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Multi-relational data mining: an introduction

ACM SIGKDD Explorations Newsletter
Active learning with multiple views

Active learning with multiple views
CrossMine: Efficient Classification Across Multiple Database Relations

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics

Machine Learning
Feature bagging for outlier detection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining relational databases with multi-view learning

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
"Missing Is Useful': Missing Values in Cost-Sensitive Decision Trees

IEEE Transactions on Knowledge and Data Engineering
Cost-sensitive learning with conditional Markov networks

ICML '06 Proceedings of the 23rd international conference on Machine learning
Test Strategies for Cost-Sensitive Decision Trees

IEEE Transactions on Knowledge and Data Engineering
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
RELATIONAL DATA MINING AND ILP FOR DOCUMENT IMAGE UNDERSTANDING

Applied Artificial Intelligence
Database Systems: The Complete Book

Database Systems: The Complete Book
Thresholding for making classifiers cost-sensitive

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Relational databases, with vast amounts of data¨Cfrom financial transactions, marketing surveys, medical records, to health informatics observations¨C and complex schemas, are ubiquitous in our society. Multirelational classification algorithms have been proposed to learn from such relational repositories, where multiple interconnected tables (relations) are involved. These methods search for relevant features both from a target relation (in which each tuple is associated with a class label) and relations related to the target, in order to better classify target relation tuples. However, in many practical database applications, such as credit card fraud detection and disease diagnosis, the target tuples are highly imbalanced. That is, the number of examples of one class (majority class) in the target relation is much higher than the others (minority classes). Many existing methods thus tend to produce poor predictive performance over the underrepresented class in the data. This paper presents a strategy to deal with such imbalanced multirelational data. The method learns from multiple views (feature sets) of relational data in order to construct view learners with different awareness of the imbalanced problem. These different observations possessed by multiple view learners are then combined, in order to yield a model which has better knowledge on both the majority and minority classes in a relational database. Experiments performed on six benchmarking data sets show that the proposed method achieves promising results when compared with other popular relational data mining algorithms, in terms of the ROC curve and AUC value obtained. In particular, an important result indicates that the method is superior when the class imbalanced is very high.