On the Computational Complexity of Optimal Multisplitting

  • Authors:
  • Tapio Elomaa;Juho Rousu

  • Affiliations:
  • Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland (e-mail: elomaa@cs.helsinki.fi / rousu@cs.helsinki.fi);Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland (e-mail: elomaa@cs.helsinki.fi / rousu@cs.helsinki.fi)

  • Venue:
  • Fundamenta Informaticae - Intelligent Systems
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The need to partition or discretize numeric value ranges arises in machine learning and data mining algorithms. This subtask is a potential time-consumption bottleneck, since the number of candidate partitions is exponential in the number of possible cut points in the range. Thus, many heuristic algorithms have been proposed for this task. Recently, the efficiency of optimal multisplitting has improved dramatically, due to the introduction of linear-time algorithms for training error minimization and quadratic-time generic algorithms. Whether these efficient algorithms are the best obtainable, is not yet known. In this paper, we probe the inherent complexity of the multisplitting problem. We reflect results obtained for similar problems in computational geometry and string matching to the multisplitting task. Subquadratic optimization algorithms in computational geometry rely on the monotonicity of the optimized function. We show by counterexamples that the widely used evaluation functions Training Set Error and Average Class Entropy do not satisfy the kind of monotonicity that facilitates subquadratic-time optimization. However, we also show that the Training Set Error function can be decomposed into monotonic subproblems, one per class, which explains its linear time optimization. Finally, we review recently developed techniques for speeding up optimal multisplitting.