Parallel decision tree with application to water quality data analysis

  • Authors:
  • Qing He;Zhi Dong;Fuzhen Zhuang;Tianfeng Shang;Zhongzhi Shi

  • Affiliations:
  • The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, China;The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, China,Graduate University of Chinese Academy of Sciences, China;The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, China;The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, China,Graduate University of Chinese Academy of Sciences, China;The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, China

  • Venue:
  • ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Decision tree is a popular classification technique in many applications, such as retail target marketing, fraud detection and design of telecommunication service plans. With the information exploration, the existing classification algorithms are not good enough to tackle large data set. In order to deal with the problem, many researchers try to design efficient parallel classification algorithms. Based on the current and powerful parallel programming framework -- MapReduce, we propose a parallel ID3 classification algorithm(PID3 for short). We use water quality data monitoring the Changjiang River which contains 17 branches as experimental data. As the data are time series, we process the data to attribute data before using the decision tree. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.