Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations
Communications of the ACM
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Computing standard deviations: accuracy
Communications of the ACM
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Database Mining: A Performance Perspective
IEEE Transactions on Knowledge and Data Engineering
A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Accurate decision trees for mining high-speed data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 2004 ACM symposium on Applied computing
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
Stress-testing hoeffding trees
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Decision Tree Induction from Numeric Data Stream
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
The Journal of Machine Learning Research
Learning model trees from evolving data streams
Data Mining and Knowledge Discovery
Kernel-based selective ensemble learning for streams of trees
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples
Information Sciences: an International Journal
Learning from data streams with only positive and unlabeled data
Journal of Intelligent Information Systems
A lossy counting based approach for learning on streams of graphs on a budget
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Data stream mining for predicting software build outcomes using source code metrics
Information and Software Technology
Hi-index | 0.00 |
For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into predefined bins or sort the data and search for the best split points. Unfortunately, none of these solutions carry over particularly well to a data stream environment. Solutions for data streams have been proposed by several authors but as yet none have been compared empirically. In this paper we investigate a range of methods for multi-class tree-based classification where the handling of numeric attributes takes place as the tree is constructed. To this end, we extend an existing approximation approach, based on simple Gaussian approximation. We then compare this method with four approaches from the literature arriving at eight final algorithm configurations for testing. The solutions cover a range of options from perfectly accurate and memory intensive to highly approximate. All methods are tested using the Hoeffding tree classification algorithm. Surprisingly, the experimental comparison shows that the most approximate methods produce the most accurate trees by allowing for faster tree growth.