Is random model better? On its accuracy and efficiency

Authors:
Wei Fan;Haixun Wang;Philip S. Yu;Sheng Ma
Affiliations:
-;-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 12
Cited 18

C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Shape quantization and recognition with randomized trees

Neural Computation
Data-dependent structural risk minimisation for perceptron decision trees

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Learning and making decisions when costs and probabilities are both unknown

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Pruning Decision Trees with Misclassification Costs

ECML '98 Proceedings of the 10th European Conference on Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
An extensible meta-learning approach for scalable and accurate inductive learning

An extensible meta-learning approach for scalable and accurate inductive learning

Systematic data selection to mine concept-drifting data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Interruptible anytime algorithms for iterative improvement of decision trees

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Effective Estimation of Posterior Probabilities: Explaining the Accuracy of Randomized Decision Tree Approaches

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Learning through Changes: An Empirical Study of Dynamic Behaviors of Probability Estimation Trees

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A general framework for accurate and fast regression by data summarization in random decision trees

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A semi-random multiple decision-tree algorithm for mining data streams

Journal of Computer Science and Technology
Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
On the optimality of probability estimation by random decision trees

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Spectrum of variable-random trees

Journal of Artificial Intelligence Research
Ensemble pruning via individual contribution ordering

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Ozone day prediction with radial basis function networks

ICS'10 Proceedings of the 14th WSEAS international conference on Systems: part of the 14th WSEAS CSCC multiconference - Volume II
Random ensemble decision trees for learning concept-drifting data streams

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
When efficient model averaging out-performs boosting and bagging

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Variable randomness in decision tree ensembles

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Maximizing tree diversity by building complete-random decision trees

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Novel ensemble methods for regression via classification problems

Expert Systems with Applications: An International Journal
A Practical Differentially Private Random Decision Tree Classifier

Transactions on Data Privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inductive learning searches an optimal hypothesis thatminimizes a given loss function. It is usually assumed thatthe simplest hypothesis that fits the data is the best approximateto an optimal hypothesis. Since finding the simplesthypothesis is NP-hard for most representations, we generallyemploy various heuristics to search its closest match.Computing these heuristics incurs significant cost, makinglearning inefficient and unscalable for large dataset. In thesame time, it is still questionable if the simplest hypothesisis indeed the closest approximate to the optimal model.Recent success of combining multiple models, such as bagging,boosting and meta-learning, has greatly improved theaccuracy of the simplest hypothesis, providing a strong argumentagainst the optimality of the simplest hypothesis.However, computing these combined hypotheses incurs significantlyhigher cost. In this paper, we first advert that aslong as the error of a hypothesis on each example is withina range dictated by a given loss function, it can still be optimal.Contrary to common beliefs, we propose a completelyrandom decision tree algorithm that achieves much higheraccuracy than the single best hypothesis and is comparableto boosted or bagged multiple best hypotheses. The advantageof multiple random tree is its training efficiency aswell as minimal memory requirement.