Neurocomputing
C4.5: programs for machine learning
C4.5: programs for machine learning
Scaling up inductive learning with massive parallelism
Machine Learning
Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
MultiBoosting: A Technique for Combining Boosting and Wagging
Machine Learning
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
Data Mining and Knowledge Discovery
Peepholing: Choosing Attributes Efficiently for Megainduction
ML '92 Proceedings of the Ninth International Workshop on Machine Learning
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
Machine Learning
Not so naive Bayes: aggregating one-dependence estimators
Machine Learning
Learning from the Past with Experiment Databases
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Adaptive Bayesian network classifiers
Intelligent Data Analysis
A Community-Based Platform for Machine Learning Experimentation
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Resource aware distributed knowledge discovery
Ubiquitous knowledge discovery
Resource aware distributed knowledge discovery
Ubiquitous knowledge discovery
Bias management of bayesian network classifiers
DS'05 Proceedings of the 8th international conference on Discovery Science
Techniques for efficient learning without search
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Alleviating naive Bayes attribute independence assumption by attribute weighting
The Journal of Machine Learning Research
Artificial Intelligence Review
Hi-index | 0.00 |
This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm - the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.