Optimizing Feature Sets for Structured Data

  • Authors:
  • Ulrich Rückert;Stefan Kramer

  • Affiliations:
  • Institut für Informatik/I12, Technische Universität München, Boltzmannstr. 3, D-85748 Garching b. München, Germany;Institut für Informatik/I12, Technische Universität München, Boltzmannstr. 3, D-85748 Garching b. München, Germany

  • Venue:
  • ECML '07 Proceedings of the 18th European conference on Machine Learning
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Choosing a suitable feature representation for structured data is a non-trivial task due to the vast number of potential candidates. Ideally, one would like to pick a small, but informative set of structural features, each providing complementary information about the instances. We frame the search for a suitable feature set as a combinatorial optimization problem. For this purpose, we define a scoring function that favors features that are as dissimilar as possible to all other features. The score is used in a stochastic local search (SLS) procedure to maximize the diversity of a feature set. In experiments on small molecule data, we investigate the effectiveness of a forward selection approach with two different linear classification schemes.