CrossMine: Efficient Classification Across Multiple Database Relations

  • Authors:
  • Xiaoxin Yin;Jiawei Han;Jiong Yang;Philip S. Yu

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of today's structured data is stored in relationaldatabases. Such a database consists of multiplerelations which are linked together conceptually viaentity-relationship links in the design of relational databaseschemas. Multi-relational classification can be widelyused in many disciplines, such as financial decision making,medical research, and geographical applications.However, most classification approaches only work on single"flat" data relations. It is usually difficult to convertmultiple relations into a single flat relation without eitherintroducing huge, undesirable "universal relation" orlosing essential information. Previous works using InductiveLogic Programming approaches (recently also knownas Relational Mining) have proven effective with high accuracyin multi-relational classification. Unfortunately,they suffer from poor scalability w.r.t. the number of relationsand the number of attributes in databases.In this paper we propose CrossMine, an efficientand scalable approach for multi-relational classification.Several novel methods are developed in CrossMine,including (1) tuple ID propagation, which performssemantics-preserving virtual join to achieve high efficiencyon databases with complex schemas, and (2) a selectivesampling method, which makes it highly scalablew.r.t. the number of tuples in the databases. Both theoreticalbackgrounds and implementation techniques ofCrossMine are introduced. Our comprehensive experimentson both real and synthetic databases demonstratethe high scalability and accuracy of CrossMine.