Parallel Formulations of Decision-Tree Classification Algorithms
Data Mining and Knowledge Discovery
A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Google's MapReduce programming model – Revisited
Science of Computer Programming
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Parallel K-Means Clustering Based on MapReduce
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Hi-index | 0.00 |
Decision tree is a popular classification technique in many applications, such as retail target marketing, fraud detection and design of telecommunication service plans. With the information exploration, the existing classification algorithms are not good enough to tackle large data set. In order to deal with the problem, many researchers try to design efficient parallel classification algorithms. Based on the current and powerful parallel programming framework -- MapReduce, we propose a parallel ID3 classification algorithm(PID3 for short). We use water quality data monitoring the Changjiang River which contains 17 branches as experimental data. As the data are time series, we process the data to attribute data before using the decision tree. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.