Inductive learning in less than one sequential data scan

Authors:
Wei Fan;Haixun Wang;Philip S. Yu;Shaw-Hwa Lo
Affiliations:
IBM T.J.Watson Research, Hawthorne, NY;IBM T.J.Watson Research, Hawthorne, NY;IBM T.J.Watson Research, Hawthorne, NY;Statistics Department, Columbia University, New York, NY
Venue:
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Year:
2003

Citing 8
Cited 1

Bagging predictors

Machine Learning
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Progressive Modeling

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An extensible meta-learning approach for scalable and accurate inductive learning

An extensible meta-learning approach for scalable and accurate inductive learning

Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most recent research of scalable inductive learning on very large dataset, decision tree construction in particular, focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art decision tree construction algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. We first discuss a general inductive learning framework that scans the dataset exactly once. Then, we propose an extension based on Hoeffding's inequality that scans the dataset less than once. Our frameworks are applicable to a wide range of inductive learners.