Classifying negative and positive points by optimal box clustering

  • Authors:
  • Paolo Serafini

  • Affiliations:
  • -

  • Venue:
  • Discrete Applied Mathematics
  • Year:
  • 2014

Quantified Score

Hi-index 0.04

Visualization

Abstract

In this paper, we address the problem of classifying positive and negative data with the technique known as box clustering. A box is homogeneous if it contains only positive (negative) points. Box clustering means finding a family of homogeneous boxes jointly containing all and only positive (negative) points. We first consider the problem of finding a family with the minimum number of boxes. Then we refine this problem into finding a family which not only consists of the minimum number of boxes but also has points that are covered as many times as possible by the boxes in the family. We call this problem the maximum redundancy problem. We model both problems as set covering problems with column generation. The pricing problem is a Maximum Box problem. Although this problem is NP-hard, there is available in the literature a combinatorial algorithm which performs well. Since the pricing has to be carried out also in the branch-and-bound search of the set covering problem, we also consider how the pricing has to be modified to take care of the branching constraints. The computational results show a good behavior of the set covering approach.