A linear-time algorithm for concave one-dimensional dynamic programming
Information Processing Letters
Elements of information theory
Elements of information theory
Dynamic programming with convexity, concavity and sparsity
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
C4.5: programs for machine learning
C4.5: programs for machine learning
Efficient agnostic PAC-learning with simple hypothesis
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
General and Efficient Multisplitting of Numerical Attributes
Machine Learning
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Concrete Math
Partitioning Nominal Attributes in Decision Trees
Data Mining and Knowledge Discovery
Incremental Induction of Decision Trees
Machine Learning
Machine Learning
Machine Learning
On Changing Continuous Attributes into Ordered Discrete Attributes
EWSL '91 Proceedings of the European Working Session on Machine Learning
On Fast and Simple Algorithms for Finding Maximal Subarrays and Applications in Learning Theory
EuroCOLT '97 Proceedings of the Third European Conference on Computational Learning Theory
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Speeding Up the Search for Optimal Partitions
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
On biases in estimating multi-valued attributes
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates
Data Mining and Knowledge Discovery
Frequency-based views to pattern collections
Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Hi-index | 0.00 |
The need to partition or discretize numeric value ranges arises in machine learning and data mining algorithms. This subtask is a potential time-consumption bottleneck, since the number of candidate partitions is exponential in the number of possible cut points in the range. Thus, many heuristic algorithms have been proposed for this task. Recently, the efficiency of optimal multisplitting has improved dramatically, due to the introduction of linear-time algorithms for training error minimization and quadratic-time generic algorithms. Whether these efficient algorithms are the best obtainable, is not yet known. In this paper, we probe the inherent complexity of the multisplitting problem. We reflect results obtained for similar problems in computational geometry and string matching to the multisplitting task. Subquadratic optimization algorithms in computational geometry rely on the monotonicity of the optimized function. We show by counterexamples that the widely used evaluation functions Training Set Error and Average Class Entropy do not satisfy the kind of monotonicity that facilitates subquadratic-time optimization. However, we also show that the Training Set Error function can be decomposed into monotonic subproblems, one per class, which explains its linear time optimization. Finally, we review recently developed techniques for speeding up optimal multisplitting.