Linear-Time Preprocessing in Optimal Numerical Range Partitioning

  • Authors:
  • Tapio Elomaa;Juho Rousu

  • Affiliations:
  • Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014, University of Helsinki, Finland. elomaa@cs.helsinki.fi;Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014, University of Helsinki, Finland. rousu@cs.helsinki.fi

  • Venue:
  • Journal of Intelligent Information Systems - Special issue: A survey of research questions for intelligent information systems in education
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Only a subset of the boundary points—the segment borders—have to be taken into account in searching for the optimal multisplit of a numerical value range with respect to the most commonly used attribute evaluation functions of classification learning algorithms. Segments and their borders can be found efficiently in a linear-time preprocessing step.In this paper we expand the applicability of segment borders by showing that inspecting them alone suffices in optimizing any convex evaluation function. For strictly convex evaluation functions inspecting all segment borders is also necessary. These results are derived directly from Jensen's inequality.We also study the evaluation function Training Set Error which is not strictly convex. With that function the data can be preprocessed into an even smaller number of cut point candidates, called alternations, when striving for optimal partition. Examining all alternations also seems necessary, since—analogously to strictly convex functions—the placement of neighboring cut points affects the optimality of an alternation. We test empirically the reduction of the number of cut point candidates that can be obtained for Training Set Error on real-world data.