Parallel Formulations of Decision-Tree Classification Algorithms

  • Authors:
  • Anurag Srivastava;Eui-Hong Han;Vipin Kumar;Vineet Singh

  • Affiliations:
  • Digital Impact. anurag@digital-impact.com;Department of Computer Science & Engineering, Army HPC Research Center, University of Minnesota. han@cs.umn.edu;Department of Computer Science & Engineering, Army HPC Research Center, University of Minnesota. kumar@cs.umn.edu;Information Technology Lab, Hitachi America, Ltd. vsingh@hitachi.com

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification decision tree algorithms are usedextensively for data mining in many domains such as retail targetmarketing, fraud detection, etc. Highly parallel algorithms forconstructing classification decision trees are desirable for dealingwith large data sets in reasonable amount of time. Algorithms forbuilding classification decision trees have a natural concurrency,but are difficult to parallelize due to the inherent dynamic natureof the computation. In this paper, we present parallel formulationsof classification decision tree learning algorithm based oninduction. We describe two basic parallel formulations. One isbased on Synchronous Tree Construction Approach and the otheris based on Partitioned Tree Construction Approach. We discussthe advantages and disadvantages of using these methods and propose ahybrid method that employs the good features of these methods. Wealso provide the analysis of the cost of computation andcommunication of the proposed hybrid method. Moreover, experimentalresults on an IBM SP-2 demonstrate excellent speedups andscalability.