Maintaining optimal multi-way splits for numerical attributes in data streams

Authors:
Tapio Elomaa;Petri Lehtinen
Affiliations:
Department of Software Systems, Tampere University of Technology, Tampere, Finland;Department of Software Systems, Tampere University of Technology, Tampere, Finland
Venue:
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2008

Citing 11
Cited 1

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
General and Efficient Multisplitting of Numerical Attributes

Machine Learning
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
On Changing Continuous Attributes into Ordered Discrete Attributes

EWSL '91 Proceedings of the European Working Session on Machine Learning
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient decision tree construction on streaming data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates

Data Mining and Knowledge Discovery
Learning decision trees from dynamic data streams

Proceedings of the 2005 ACM symposium on Applied computing
Discretization from data streams: applications to histograms and data mining

Proceedings of the 2006 ACM symposium on Applied computing

Obtaining low-arity discretizations from online data streams

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the batch learning setting it suffices to take into account only a reduced number of threshold candidates in discretizing the value range of a numerical attribute for many commonly-used attribute evaluation functions. We show that the same techniques are also efficiently applicable in the on-line learning scheme. Only constant time per example is needed for determining the changes on data grouping. Hence, one can apply multi-way splits, e.g., in the standard approach to decision tree learning from data streams. We also briefly consider modifications needed to cope with drifting concepts. Our empirical evaluation demonstrates that often the reduction in threshold candidates obtained is high for the important attributes. In a data stream logarithmic growth in the number of potential cut points and the reduced number of threshold candidates is observed.