Some Alternatives to Parikh Matrices Using String Kernels

  • Authors:
  • Alexander Clark;Chris Watkins

  • Affiliations:
  • (Correspd.) Department of Computer Science Royal Holloway, University of London Egham TW20 0EX, United Kingdom. E-mail: {alexc,chrisw}@cs.rhul.ac.uk;Department of Computer Science Royal Holloway, University of London Egham TW20 0EX, United Kingdom. E-mail: {alexc,chrisw}@cs.rhul.ac.uk

  • Venue:
  • Fundamenta Informaticae
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices [8], which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this paper we show how one can use string kernels to construct two alternatives to Parikh matrices, that overcome some of the limitations of the Parikh matrix construction. These are morphisms from the free monoid to rings of real-valued matrices under multiplication: one is based on the subsequence kernel and the other on the gap-weighted string kernel. For the latter kernel we demonstrate that for many values of the gap-weight hyperparameter the resulting morphism is injective.