Handling different categories of concept drifts in data streams using distributed GP

Authors:
Gianluigi Folino;Giuseppe Papuzzo
Affiliations:
Institute for High Performance Computing and Networking, CNR-ICAR;Institute for High Performance Computing and Networking, CNR-ICAR
Venue:
EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Year:
2010

Citing 15
Cited 0

The Strength of Weak Learnability

Machine Learning
Boosting a weak learning algorithm by majority

Information and Computation
Bagging predictors

Machine Learning
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Induction of Decision Trees

Machine Learning
Incremental Learning from Noisy Data

Machine Learning
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

The Journal of Machine Learning Research
A Grid and Fractal Dimension-Based Data Stream Clustering Algorithm

ISISE '08 Proceedings of the 2008 International Symposium on Information Science and Engieering - Volume 01
The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift

IEEE Transactions on Knowledge and Data Engineering
Mining distributed evolving data streams using fractal GP ensembles

EuroGP'07 Proceedings of the 10th European conference on Genetic programming
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
GP ensembles for large-scale data classification

IEEE Transactions on Evolutionary Computation
Training Distributed GP Ensemble With a Selective Algorithm Based on Clustering and Pruning for Pattern Classification

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using Genetic Programming (GP) for classifying data streams is problematic as GP is slow compared with traditional single solution techniques. However, the availability of cheaper and better-performing distributed and parallel architectures make it possible to deal with complex problems previously hardly solved owing to the large amount of time necessary. This work presents a general framework based on a distributed GP ensemble algorithm for coping with different types of concept drift for the task of classification of large data streams. The framework is able to detect changes in a very efficient way using only a detection function based on the incoming unclassified data. Thus, only if a change is detected a distributed GP algorithm is performed in order to improve classification accuracy and this limits the overhead associated with the use of a population-based method. Real world data streams may present drifts of different types. The introduced detection function, based on the self-similarity fractal dimension, permits to cope in a very short time with the main types of different drifts, as demonstrated by the first experiments performed on some artificial datasets. Furthermore, having an adequate number of resources, distributed GP can handle very frequent concept drifts.