Symbolic and Neural Learning Algorithms: An Experimental Comparison
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets
Data Mining and Knowledge Discovery
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Tree induction vs. logistic regression: a learning-curve analysis
The Journal of Machine Learning Research
Learning from little: comparison of classifiers given little training
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Learning Bayesian networks with local structure
UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Stochastic gradient boosted distributed decision trees
Proceedings of the 18th ACM conference on Information and knowledge management
A Modified Short and Fukunaga Metric based on the attribute independence assumption
Pattern Recognition Letters
Entropy-Guided feature generation for structured learning of portuguese dependency parsing
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
An Augmented Value Difference Measure
Pattern Recognition Letters
A self-learning nurse call system
Computers in Biology and Medicine
Hybrid random subsample classifier ensemble for high dimensional data sets
International Journal of Hybrid Intelligent Systems
Hi-index | 0.00 |
There is growing interest in scaling up the widely-used decision-tree learning algorithms to very large data sets. Although numerous diverse techniques have been proposed, a fast tree-growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential. In this paper, we present a novel, fast decision-tree learning algorithm that is based on a conditional independence assumption. The new algorithm has a time complexity of O(m ċ n), where m is the size of the training data and n is the number of attributes. This is a significant asymptotic improvement over the time complexity O(m ċ n2) of the standard decision-tree learning algorithm C4.5, with an additional space increase of only O(n). Experiments show that our algorithm performs competitively with C4.5 in accuracy on a large number of UCI benchmark data sets, and performs even better and significantly faster than C4.5 on a large number of text classification data sets. The time complexity of our algorithm is as low as naive Bayes'. Indeed, it is as fast as naive Bayes but outperforms naive Bayes in accuracy according to our experiments. Our algorithm is a core tree-growing algorithm that can be combined with other scaling-up techniques to achieve further speedup.