On the Computational Complexity of Optimal Multisplitting

Authors:
Tapio Elomaa;Juho Rousu
Affiliations:
Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland (e-mail: elomaa@cs.helsinki.fi / rousu@cs.helsinki.fi);Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland (e-mail: elomaa@cs.helsinki.fi / rousu@cs.helsinki.fi)
Venue:
Fundamenta Informaticae - Intelligent Systems
Year:
2001

Citing 21
Cited 2

A linear-time algorithm for concave one-dimensional dynamic programming

Information Processing Letters
A Distance-Based Attribute Selection Measure for Decision Tree Induction

Machine Learning
Elements of information theory

Elements of information theory
Dynamic programming with convexity, concavity and sparsity

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Efficient agnostic PAC-learning with simple hypothesis

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
General and Efficient Multisplitting of Numerical Attributes

Machine Learning
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Concrete Math

Concrete Math
Partitioning Nominal Attributes in Decision Trees

Data Mining and Knowledge Discovery
Incremental Induction of Decision Trees

Machine Learning
The CN2 Induction Algorithm

Machine Learning
An Empirical Comparison of Selection Measures for Decision-Tree Induction

Machine Learning
Induction of Decision Trees

Machine Learning
On Changing Continuous Attributes into Ordered Discrete Attributes

EWSL '91 Proceedings of the European Working Session on Machine Learning
On Fast and Simple Algorithms for Finding Maximal Subarrays and Applications in Learning Theory

EuroCOLT '97 Proceedings of the Third European Conference on Computational Learning Theory
Generalizing Boundary Points

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Speeding Up the Search for Optimal Partitions

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
On biases in estimating multi-valued attributes

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates

Data Mining and Knowledge Discovery
Frequency-based views to pattern collections

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need to partition or discretize numeric value ranges arises in machine learning and data mining algorithms. This subtask is a potential time-consumption bottleneck, since the number of candidate partitions is exponential in the number of possible cut points in the range. Thus, many heuristic algorithms have been proposed for this task. Recently, the efficiency of optimal multisplitting has improved dramatically, due to the introduction of linear-time algorithms for training error minimization and quadratic-time generic algorithms. Whether these efficient algorithms are the best obtainable, is not yet known. In this paper, we probe the inherent complexity of the multisplitting problem. We reflect results obtained for similar problems in computational geometry and string matching to the multisplitting task. Subquadratic optimization algorithms in computational geometry rely on the monotonicity of the optimized function. We show by counterexamples that the widely used evaluation functions Training Set Error and Average Class Entropy do not satisfy the kind of monotonicity that facilitates subquadratic-time optimization. However, we also show that the Training Set Error function can be decomposed into monotonic subproblems, one per class, which explains its linear time optimization. Finally, we review recently developed techniques for speeding up optimal multisplitting.