Parallelization of Decision Tree Algorithm and its Performance Evaluation

  • Authors:
  • Kazuto Kubota;Akihiko Nakase;Hiroshi Sakai;Shigeru Oyanagi

  • Affiliations:
  • -;-;-;-

  • Venue:
  • HPC '00 Proceedings of the The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region-Volume 2 - Volume 2
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining is a typical application of high performance computing in the business field. An efficient data mining system, which can deal with huge amount of data, is desired. This paper describes the parallel processing of decision tree, which is a typical algorithm for classification of large database. A free software C4.5 is parallelized for SMP machine using thread library. Parallelism in generating a decision tree can be classified into intra-node parallelism and inter-node parallelism. Intra-node parallelism can be further classified into record parallelism, attribute parallelism, and their combination. We have implemented these four kinds of parallelizing methods, and evaluated their effects with four kinds of test data. The result shows that there is a relation between the characteristics of data and the parallelizing methods, and combination of multiple parallelizing methods is the most effective one.