Sparse substring pattern set discovery using linear programming boosting

  • Authors:
  • Kazuaki Kashihara;Kohei Hatano;Hideo Bannai;Masayuki Takeda

  • Affiliations:
  • Department of Informatics, Kyushu University;Department of Informatics, Kyushu University;Department of Informatics, Kyushu University;Department of Informatics, Kyushu University

  • Venue:
  • DS'10 Proceedings of the 13th international conference on Discovery science
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we consider finding a small set of substring patterns which classifies the given documents well. We formulate the problem as 1 norm soft margin optimization problem where each dimension corresponds to a substring pattern. Then we solve this problem by using LPBoost and an optimal substring discovery algorithm. Since the problem is a linear program, the resulting solution is likely to be sparse, which is useful for feature selection. We evaluate the proposed method for real data such as movie reviews.