Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates
Data Mining and Knowledge Discovery
Discretization from data streams: applications to histograms and data mining
Proceedings of the 2006 ACM symposium on Applied computing
Hi-index | 0.00 |
The time complexities of class-driven numerical range discretization algorithms depend on the number of cut point candidates. Previous analysis has shown that only a subset of all cut points - the segment borders - have to be taken into account in optimal discretization with respect to many goodness criteria. In this paper we show that inspecting segment borders alone suffices in optimizing any convex evaluation function. For strictly convex evaluation functions inspecting all of them also is necessary, since the placement of neighboring cut points affects the optimality of a segment border. With the training set error function, which is not strictly convex, it suffices to inspect an even smaller set of cut point candidates, called alternations, when striving for optimal partition. On the other hand, we prove that failing to check an alternation may lead to suboptimal discretization. We present a linear-time algorithm for finding all alternation points. The number of alternation points is typically much lower than the total number of cut points. In our experiments running the discretization algorithm over the sequence of alternation points led to a significant speed-up.