Learning from Skewed Class Multi-relational Databases

  • Authors:
  • Hongyu Guo;Herna L. Viktor

  • Affiliations:
  • (Correspd.) School of Information Technology and Engineering, University of Ottawa, Canada. hguo028@site.uottawa.ca, hlviktor@site.uottawa.ca;School of Information Technology and Engineering, University of Ottawa, Canada. hguo028@site.uottawa.ca, hlviktor@site.uottawa.ca

  • Venue:
  • Fundamenta Informaticae - Progress on Multi-Relational Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Relational databases, with vast amounts of data¨Cfrom financial transactions, marketing surveys, medical records, to health informatics observations¨C and complex schemas, are ubiquitous in our society. Multirelational classification algorithms have been proposed to learn from such relational repositories, where multiple interconnected tables (relations) are involved. These methods search for relevant features both from a target relation (in which each tuple is associated with a class label) and relations related to the target, in order to better classify target relation tuples. However, in many practical database applications, such as credit card fraud detection and disease diagnosis, the target tuples are highly imbalanced. That is, the number of examples of one class (majority class) in the target relation is much higher than the others (minority classes). Many existing methods thus tend to produce poor predictive performance over the underrepresented class in the data. This paper presents a strategy to deal with such imbalanced multirelational data. The method learns from multiple views (feature sets) of relational data in order to construct view learners with different awareness of the imbalanced problem. These different observations possessed by multiple view learners are then combined, in order to yield a model which has better knowledge on both the majority and minority classes in a relational database. Experiments performed on six benchmarking data sets show that the proposed method achieves promising results when compared with other popular relational data mining algorithms, in terms of the ROC curve and AUC value obtained. In particular, an important result indicates that the method is superior when the class imbalanced is very high.