From centralized to distributed decision tree induction using CHAID and fisher's linear discriminant function algorithms

Authors:
Jie Ouyang;Nilesh Patel;Ishwar K. Sethi
Affiliations:
Department of System Engineering and Computer Science, Oakland University, Rochester, MI;Department of System Engineering and Computer Science, Oakland University, Rochester, MI;Department of System Engineering and Computer Science, Oakland University, Rochester, MI
Venue:
Intelligent Decision Technologies
Year:
2011

Citing 19
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix

Pattern Recognition Letters
Random Forests

Machine Learning
Boosting Algorithms for Parallel and Distributed Learning

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
Distributed learning with bagging-like performance

Pattern Recognition Letters
Linear Discriminant Trees

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Binary Classification Trees for Multi-class Classification Problems

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Communication Efficient Construction of Decision Trees Over Heterogeneously Distributed Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees

International Journal of Hybrid Intelligent Systems
A comparison of generalized linear discriminant analysis algorithms

Pattern Recognition
Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network

IEEE Transactions on Knowledge and Data Engineering
Distributed Decision-Tree Induction in Peer-to-Peer Systems

Statistical Analysis and Data Mining
Induction of multiclass multifeature split decision trees from distributed data

Pattern Recognition
Generalizing discriminant analysis using the generalized singular value decomposition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper extends two well-known decision tree methods for centralized data to distributed data settings. The first method is an extension of CHAID algorithm and generates single feature based multi-way split decision trees. The second method is based on Fisher's linear discriminant (FLO) function and generates multifeature binary trees. Both methods aim to generate compact trees and are able to handle multiple classes. The suggested extensions for distributed environment are compared to their centralized counterparts and also to each other. Theoretical analysis and experimental tests demonstrate the effectiveness of the extensions. In addition, the side-by-side comparison highlights the advantages and deficiencies of these methods under different settings of the distribution environments.