Appropriate kernel functions for support vector machine learning with sequences of symbolic data

  • Authors:
  • Bram Vanschoenwinkel;Bernard Manderick

  • Affiliations:
  • Computational Modeling Lab, Vrije Universiteit Brussel, Brussel, Belgium;Computational Modeling Lab, Vrije Universiteit Brussel, Brussel, Belgium

  • Venue:
  • Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In classification problems, machine learning algorithms often make use of the assumption that (dis)similar inputs lead to (dis)similar outputs. In this case, two questions naturally arise: what does it mean for two inputs to be similar and how can this be used in a learning algorithm? In support vector machines, similarity between input examples is implicitly expressed by a kernel function that calculates inner products in the feature space. For numerical input examples the concept of an inner product is easy to define, for discrete structures like sequences of symbolic data however these concepts are less obvious. This article describes an approach to SVM learning for symbolic data that can serve as an alternative to the bag-of-words approach under certain circumstances. This latter approach first transforms symbolic data to vectors of numerical data which are then used as arguments for one of the standard kernel functions. In contrast, we will propose kernels that operate on the symbolic data directly.