Fast split selection method and its application in decision tree construction from large databases

  • Authors:
  • Hung Son Nguyen;Sinh Hoa Nguyen

  • Affiliations:
  • Institute of Mathematics, Warsaw University, Banacha 2, Warsaw 02-097, Poland (Corresponding author. E-mail: son@mimuw.edu.pl);Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008, Warszawa, Poland

  • Venue:
  • International Journal of Hybrid Intelligent Systems - Hybrid Intelligence using rough sets
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an efficient method for decision tree construction from large data sets, which are assumed to be stored in database servers, and be accessible by SQL queries. The proposed method minimizes the number of simple queries necessary to search for the best splits (cut points) by employing "divide and conquer" search strategy. To make it possible, we develop some novel evaluation measures which are defined on intervals of attribute domains. Proposed measures are necessary to estimate the quality of the best cut in a given interval. We propose some applications of the presented approach in discretization and construction of "soft decision tree", which is a novel classifier model.