EDLRT: Entropy-based dummy variables logistic regression tree

  • Authors:
  • Chi-Ming Tsou;Shyue-Ping Chi;Deng-Yuan Huang

  • Affiliations:
  • (Correspd. Tel.: +886 2 82093211#6312/ E-mail: im065@mail.lhu.edu.tw) Department of Information Management, Lunghwa University of Science and Technology, Taiwan;Department of Information Management, Fu-Jen Catholic University, Taiwan;Institute of Applied Statistics and Information, Fu-Jen Catholic University, Taiwan

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

An algorithm named EDLRT (entropy-based dummy variable logistic regression tree) has been developed to handle decision tree processes. The main feature of EDLRT is constructing an entropy-based non-linear regression tree in the form of logistic formula. EDLRT comprises two key steps: the first step is to establish a decision tree by selecting the splitting variables with maximum mutual information; the second step is to convert the splitting points into dummy variables and fit them into a logistic regression model, and use genetic or Lasso algorithm to estimate the coefficients of parameters. The mathematical treatment of various types of variables for entropy evaluation and splitting point determination is illustrated. The advantage in using mutual information as a key criterion in splitting variable selection is elucidated. Step-by-step procedure of decision tree construction and dummy variable manipulation are illustrated by case study. EDLRT is very tolerant to missing values and it is also very effective for outlier detection. These advantages are demonstrated with case studies.