The Need for Low Bias Algorithms in Classification Learning from Large Data Sets

Authors:
Damien Brain;Geoffrey I. Webb
Affiliations:
-;-
Venue:
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2002

Citing 13
Cited 11

Neurocomputing

Neurocomputing
C4.5: programs for machine learning

C4.5: programs for machine learning
Scaling up inductive learning with massive parallelism

Machine Learning
Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
MultiBoosting: A Technique for Combining Boosting and Wagging

Machine Learning
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Peepholing: Choosing Attributes Efficiently for Megainduction

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Cached sufficient statistics for efficient machine learning with large datasets

Journal of Artificial Intelligence Research

Functional Trees

Machine Learning
Not so naive Bayes: aggregating one-dependence estimators

Machine Learning
Learning from the Past with Experiment Databases

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Adaptive Bayesian network classifiers

Intelligent Data Analysis
A Community-Based Platform for Machine Learning Experimentation

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Resource aware distributed knowledge discovery

Ubiquitous knowledge discovery
Resource aware distributed knowledge discovery

Ubiquitous knowledge discovery
Bias management of bayesian network classifiers

DS'05 Proceedings of the 8th international conference on Discovery Science
Techniques for efficient learning without search

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Alleviating naive Bayes attribute independence assumption by attribute weighting

The Journal of Machine Learning Research
DF-SVM: a decision forest constructed on artificially enlarged feature space by support vector machine

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm - the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.