Issues in evaluation of stream learning algorithms

Authors:
João Gama;Raquel Sebastião;Pedro Pereira Rodrigues
Affiliations:
University of Porto, Porto, Portugal;University of Porto, Porto, Portugal;University of Porto, Porto, Portugal
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 13
Cited 26

Detection of abrupt changes: theory and application

Detection of abrupt changes: theory and application
Learning in the presence of concept drift and hidden contexts

Machine Learning
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Forest trees for on-line data

Proceedings of the 2004 ACM symposium on Applied computing
Discovering decision rules from numerical data streams

Proceedings of the 2004 ACM symposium on Applied computing
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
Bias management of bayesian network classifiers

DS'05 Proceedings of the 8th international conference on Discovery Science

MOA: Massive Online Analysis

The Journal of Machine Learning Research
Sentiment knowledge discovery in twitter streaming data

DS'10 Proceedings of the 13th international conference on Discovery science
Learning model trees from evolving data streams

Data Mining and Knowledge Discovery
L2GClust: local-to-global clustering of stream sources

Proceedings of the 2011 ACM Symposium on Applied Computing
Effective sentiment stream analysis with self-augmenting training and demand-driven projection

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Random ensemble decision trees for learning concept-drifting data streams

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A bounded version of online boosting on open-ended data streams

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Controlled permutations for testing adaptive classifiers

DS'11 Proceedings of the 14th international conference on Discovery science
Mining Recurring Concept Drifts with Limited Labeled Streaming Data

ACM Transactions on Intelligent Systems and Technology (TIST)
Ensembles of Restricted Hoeffding Trees

ACM Transactions on Intelligent Systems and Technology (TIST)
Learning from concept drifting data streams with unlabeled data

Neurocomputing
Learning decision rules from data streams

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Dealing with concept drift and class imbalance in multi-label stream classification

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Securing advanced metering infrastructure using intrusion detection system with data stream mining

PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
Framework for stream learning algorithms

International Journal of Computational Intelligence Studies
Mobile activity recognition using ubiquitous data stream mining

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Handling time changing data with adaptive very fast decision rules

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
A social approach for learning agents

Expert Systems with Applications: An International Journal
On evaluating stream learning algorithms

Machine Learning
A comparative study on feature selection and adaptive strategies for email foldering using the ABC-DynF framework

Knowledge-Based Systems
Decremental learning of evolving fuzzy inference systems: application to handwritten gesture recognition

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Forecasting with twitter data

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Sentiment analysis on evolving social streams: how self-report imbalances can help

Proceedings of the 7th ACM international conference on Web search and data mining
MetaStream: A meta-learning based method for periodic algorithm selection in time-changing data

Neurocomputing
Customer profile classification: To adapt classifiers or to relabel customer profiles?

Neurocomputing
Just-in-time adaptive similarity component analysis in nonstationary environments

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. There are no golden standards for assessing performance in non-stationary environments. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of Predictive Sequential methods for error estimate - the prequential error. The prequential error allows us to monitor the evolution of the performance of models that evolve over time. Nevertheless, it is known to be a pessimistic estimator in comparison to holdout estimates. To obtain more reliable estimators we need some forgetting mechanism. Two viable alternatives are: sliding windows and fading factors. We observe that the prequential error converges to an holdout estimator when estimated over a sliding window or using fading factors. We present illustrative examples of the use of prequential error estimators, using fading factors, for the tasks of: i) assessing performance of a learning algorithm; ii) comparing learning algorithms; iii) hypothesis testing using McNemar test; and iv) change detection using Page-Hinkley test. In these tasks, the prequential error estimated using fading factors provide reliable estimators. In comparison to sliding windows, fading factors are faster and memory-less, a requirement for streaming applications. This paper is a contribution to a discussion in the good-practices on performance assessment when learning dynamic models that evolve over time.