The Strength of Weak Learnability
Machine Learning
Boosting a weak learning algorithm by majority
Information and Computation
Machine Learning
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Induction of Decision Trees
Machine Learning
Incremental Learning from Noisy Data
Machine Learning
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts
The Journal of Machine Learning Research
A Grid and Fractal Dimension-Based Data Stream Clustering Algorithm
ISISE '08 Proceedings of the 2008 International Symposium on Information Science and Engieering - Volume 01
The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift
IEEE Transactions on Knowledge and Data Engineering
Mining distributed evolving data streams using fractal GP ensembles
EuroGP'07 Proceedings of the 10th European conference on Genetic programming
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
GP ensembles for large-scale data classification
IEEE Transactions on Evolutionary Computation
IEEE Transactions on Evolutionary Computation
Hi-index | 0.00 |
Using Genetic Programming (GP) for classifying data streams is problematic as GP is slow compared with traditional single solution techniques. However, the availability of cheaper and better-performing distributed and parallel architectures make it possible to deal with complex problems previously hardly solved owing to the large amount of time necessary. This work presents a general framework based on a distributed GP ensemble algorithm for coping with different types of concept drift for the task of classification of large data streams. The framework is able to detect changes in a very efficient way using only a detection function based on the incoming unclassified data. Thus, only if a change is detected a distributed GP algorithm is performed in order to improve classification accuracy and this limits the overhead associated with the use of a population-based method. Real world data streams may present drifts of different types. The introduced detection function, based on the self-similarity fractal dimension, permits to cope in a very short time with the main types of different drifts, as demonstrated by the first experiments performed on some artificial datasets. Furthermore, having an adequate number of resources, distributed GP can handle very frequent concept drifts.