Building fast decision trees from large training sets

Authors:
A. Franco-Arcega;J. A. Carrasco-Ochoa;G. Sánchez-Díaz;J. Fco Martínez-Trinidad
Affiliations:
Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico and Research Center of Technologies on Information and Systems, Autonomous University of Sta ...;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico;Faculty of Engineering, Universidad Autonoma de San Luis Potosi, SLP, Mexico;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
Venue:
Intelligent Data Analysis
Year:
2012

Citing 29
Cited 0

An incremental method for finding multivariate splits for decision trees

Proceedings of the seventh international conference (1990) on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Decision Tree Induction Based on Efficient Tree Restructuring

Machine Learning
BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets

Data Mining and Knowledge Discovery
Incremental Induction of Decision Trees

Machine Learning
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
The Alternating Decision Tree Learning Algorithm

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Linear Machine Decision Trees

Linear Machine Decision Trees
Efficient decision tree construction on streaming data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining multiple class distribution modified subsamples in a single tree

Pattern Recognition Letters
Decision trees using model ensemble-based nodes

Pattern Recognition
A New Incremental Algorithm for Induction of Multivariate Decision Trees for Large Datasets

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Induction of multiclass multifeature split decision trees from distributed data

Pattern Recognition
A Streaming Parallel Decision Tree Algorithm

The Journal of Machine Learning Research
A new fuzzy decision tree classification method for mining high-speed data streams based on binary search trees

FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
BOAI: fast alternating decision tree induction based on bottom-up evaluation

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A new node splitting measure for decision tree construction

Pattern Recognition
Model trees for classification of hybrid data types

IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Multivariate decision trees using different splitting attribute subsets for large datasets

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Top-down induction of decision trees classifiers - a survey

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
C-fuzzy decision trees

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Fuzzy decision trees: issues and methods

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision trees are commonly used in supervised classification. Currently, supervised classification problems with large training sets are very common, however many supervised classifiers cannot handle this amount of data. There are some decision tree induction algorithms that are capable to process large training sets, however almost all of them have memory restrictions because they need to keep in main memory the whole training set, or a big amount of it. Moreover, algorithms that do not have memory restrictions have to choose a subset of the training set, needing extra time for this selection; or they require to specify the values for some parameters that could be very difficult to determine by the user. In this paper, we present a new fast heuristic for building decision trees from large training sets, which overcomes some of the restrictions of the state of the art algorithms, using all the instances of the training set without storing all of them in main memory. Experimental results show that our algorithm is faster than the most recent algorithms for building decision trees from large training sets.