BOAI: fast alternating decision tree induction based on bottom-up evaluation

  • Authors:
  • Bishan Yang;Tengjiao Wang;Dongqing Yang;Lei Chang

  • Affiliations:
  • Key Laboratory of High Confidence Software Technologies, Ministry of Education, China, School of Electronics Engineering and Computer Science, Peking University, Beijing, China;Key Laboratory of High Confidence Software Technologies, Ministry of Education, China, School of Electronics Engineering and Computer Science, Peking University, Beijing, China;Key Laboratory of High Confidence Software Technologies, Ministry of Education, China, School of Electronics Engineering and Computer Science, Peking University, Beijing, China;Key Laboratory of High Confidence Software Technologies, Ministry of Education, China, School of Electronics Engineering and Computer Science, Peking University, Beijing, China

  • Venue:
  • PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Alternating Decision Tree (ADTree) is a successful classification model based on boosting and has a wide range of applications. The existing ADTree induction algorithms apply a "top-down" strategy to evaluate the best split at each boosting iteration, which is very time-consuming and thus is unsuitable for modeling on large data sets. This paper proposes a fast ADTree induction algorithm (BOAI) based on "bottom-up" evaluation, which offers high performance on massive data without sacrificing classification accuracy. BOAI uses a pre-sorting technique and dynamically evaluates splits by a bottom-up approach based on VW-group. With these techniques, huge redundancy in sorting and computation can be eliminated in the tree induction procedure. Experimental results on both real and synthetic data sets show that BOAI outperforms the best existing ADTree induction algorithm by a significant margin. In the real case study, BOAI also provides better performance than TreeNet and Random Forests, which are considered as efficient classification models.