C4.5: programs for machine learning
C4.5: programs for machine learning
Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix
Pattern Recognition Letters
Machine Learning
Boosting Algorithms for Parallel and Distributed Learning
Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
Data Mining and Knowledge Discovery
Distributed learning with bagging-like performance
Pattern Recognition Letters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Binary Classification Trees for Multi-class Classification Problems
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Communication Efficient Construction of Decision Trees Over Heterogeneously Distributed Data
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
International Journal of Hybrid Intelligent Systems
A comparison of generalized linear discriminant analysis algorithms
Pattern Recognition
IEEE Transactions on Knowledge and Data Engineering
Distributed Decision-Tree Induction in Peer-to-Peer Systems
Statistical Analysis and Data Mining
Induction of multiclass multifeature split decision trees from distributed data
Pattern Recognition
Generalizing discriminant analysis using the generalized singular value decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper extends two well-known decision tree methods for centralized data to distributed data settings. The first method is an extension of CHAID algorithm and generates single feature based multi-way split decision trees. The second method is based on Fisher's linear discriminant (FLO) function and generates multifeature binary trees. Both methods aim to generate compact trees and are able to handle multiple classes. The suggested extensions for distributed environment are compared to their centralized counterparts and also to each other. Theoretical analysis and experimental tests demonstrate the effectiveness of the extensions. In addition, the side-by-side comparison highlights the advantages and deficiencies of these methods under different settings of the distribution environments.