Maintaining optimal multi-way splits for numerical attributes in data streams

  • Authors:
  • Tapio Elomaa;Petri Lehtinen

  • Affiliations:
  • Department of Software Systems, Tampere University of Technology, Tampere, Finland;Department of Software Systems, Tampere University of Technology, Tampere, Finland

  • Venue:
  • PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the batch learning setting it suffices to take into account only a reduced number of threshold candidates in discretizing the value range of a numerical attribute for many commonly-used attribute evaluation functions. We show that the same techniques are also efficiently applicable in the on-line learning scheme. Only constant time per example is needed for determining the changes on data grouping. Hence, one can apply multi-way splits, e.g., in the standard approach to decision tree learning from data streams. We also briefly consider modifications needed to cope with drifting concepts. Our empirical evaluation demonstrates that often the reduction in threshold candidates obtained is high for the important attributes. In a data stream logarithmic growth in the number of potential cut points and the reduced number of threshold candidates is observed.