Journal of the ACM (JACM)
Theoretical Computer Science
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Subword histories and Parikh matrices
Journal of Computer and System Sciences
Learning linearly separable languages
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Languages as hyperplanes: grammatical inference with string kernels
ECML'06 Proceedings of the 17th European conference on Machine Learning
Planar languages and learnability
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
On the Injectivity of the Parikh Matrix Mapping
Fundamenta Informaticae
Hi-index | 0.00 |
We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices [8], which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this paper we show how one can use string kernels to construct two alternatives to Parikh matrices, that overcome some of the limitations of the Parikh matrix construction. These are morphisms from the free monoid to rings of real-valued matrices under multiplication: one is based on the subsequence kernel and the other on the gap-weighted string kernel. For the latter kernel we demonstrate that for many values of the gap-weight hyperparameter the resulting morphism is injective.