Classification Using Streaming Random Forests

Authors:
Hanady Abdulsalam;David B. Skillicorn;Patrick Martin
Affiliations:
Kuwait University, Kuwait;Queen's University, Kingston;Queen's University, Kingston
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2011

Citing 0
Cited 7

GP under streaming data constraints: a case for pareto archiving?

Proceedings of the 14th annual conference on Genetic and evolutionary computation
Data stream classification with artificial endocrine system

Applied Intelligence
GARF: towards self-optimised random forests

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Benchmarking pareto archiving heuristics in the presence of concept drift: diversity versus age

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Dynamic multi-objective evolution of classifier ensembles for video face recognition

Applied Soft Computing
Ensemble of online neural networks for non-stationary and imbalanced data streams

Neurocomputing
Intelligence for the personal web

The Personal Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of data stream classification, where the data arrive in a conceptually infinite stream, and the opportunity to examine each record is brief. We introduce a stream classification algorithm that is online, running in amortized {\cal O}(1) time, able to handle intermittent arrival of labeled records, and able to adjust its parameters to respond to changing class boundaries (“concept drift”) in the data stream. In addition, when blocks of labeled data are short, the algorithm is able to judge internally whether the quality of models updated from them is good enough for deployment on unlabeled records, or whether further labeled records are required. Unlike most proposed stream-classification algorithms, multiple target classes can be handled. Experimental results on real and synthetic data show that accuracy is comparable to a conventional classification algorithm that sees all of the data at once and is able to make multiple passes over it.