Implementation of a scalable decision forest model based on information theory

  • Authors:
  • Li-Min Wang;Xue-Bai Zang

  • Affiliations:
  • College of Computer Science and Technology, Jilin University, PR China and State Key Laboratory for Novel Software Technology, Nanjing University, PR China;College of Computer Science and Technology, Jilin University, PR China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

One of the most challenging problems in data mining is to develop scalable algorithms capable of mining massive data sets. A novel decision forest learning algorithm named FDF is proposed in this paper to represent multi-level semantic knowledge of the relationship between the data and information implicated. FDF provides their users with just a single set of rules by redefining information gain of information theory, then each tree in the decision forest is constructed in the down-top learning framework, and the number of trees and stopping criteria can be set automatically. When no existing tree match test samples, FDF will build new logical rules for this and thus realize scalable construction process. Empirical studies on a set of natural domains show that decision forest has clear advantages with respect to probabilistic performance.