Kernel methods for learning languages

Authors:
Leonid (Aryeh) Kontorovich;Corinna Cortes;Mehryar Mohri
Affiliations:
Department of Mathematics, Weizmann Institute of Science, Rehovot, 76100, Israel;Google Research, 76 Ninth Avenue, New York, NY 10011, United States;Courant Institute of Mathematical Sciences, 251 Mercer Street, New York, NY 10012, United States and Google Research, 76 Ninth Avenue, New York, NY 10011, United States
Venue:
Theoretical Computer Science
Year:
2008

Citing 17
Cited 7

Predicting {0,1}-functions on randomly drawn points

COLT '88 Proceedings of the first annual workshop on Computational learning theory
Finite automata

Handbook of theoretical computer science (vol. B)
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The minimum consistent DFA problem cannot be approximated within any polynomial

Journal of the ACM (JACM)
Efficient learning of typical finite automata from random walks

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
An introduction to computational learning theory

An introduction to computational learning theory
Support-Vector Networks

Machine Learning
Formal languages: an introduction and a synopsis

Handbook of formal languages, vol. 1
On the learnability and usage of acyclic probabilistic finite automata

Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
Generalization performance of support vector machines and other pattern classifiers

Advances in kernel methods
Inference of Reversible Languages

Journal of the ACM (JACM)
Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Piecewise testable events

Proceedings of the 2nd GI Conference on Automata Theory and Formal Languages
Rational Kernels: Theory and Algorithms

The Journal of Machine Learning Research
Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications)

Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications)
Learning languages with rational kernels

COLT'07 Proceedings of the 20th annual conference on Learning theory
Learning linearly separable languages

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory

Learning languages with rational kernels

COLT'07 Proceedings of the 20th annual conference on Learning theory
Estimating strictly piecewise distributions

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
String extension learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
On languages piecewise testable in the strict sense

MOL'07/09 Proceedings of the 10th and 11th Biennial conference on The mathematics of language
Factor and subsequence kernels and signatures of rational languages

CIAA'12 Proceedings of the 17th international conference on Implementation and Application of Automata
On the learnability of shuffle ideals

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
On the learnability of shuffle ideals

The Journal of Machine Learning Research

Quantified Score

Hi-index	5.23

Visualization

Abstract

This paper studies a novel paradigm for learning formal languages from positive and negative examples which consists of mapping strings to an appropriate high-dimensional feature space and learning a separating hyperplane in that space. Such mappings can often be represented flexibly with string kernels, with the additional benefit of computational efficiency. The paradigm inspected can thus be viewed as that of using kernel methods for learning languages. We initiate the study of the linear separability of automata and languages by examining the rich class of piecewise-testable languages. We introduce a subsequence feature mapping to a Hilbert space and prove that piecewise-testable languages are linearly separable in that space. The proof makes use of word combinatorial results relating to subsequences. We also show that the positive definite symmetric kernel associated to this embedding is a rational kernel and show that it can be computed in quadratic time using general-purpose weighted automata algorithms. Our examination of the linear separability of piecewise-testable languages leads us to study the general problem of separability with other finite regular covers. We show that all languages linearly separable under a regular finite cover embedding, a generalization of the subsequence embedding we use, are regular. We give a general analysis of the use of support vector machines in combination with kernels to determine a separating hyperplane for languages and study the corresponding learning guarantees. Our analysis includes several additional linear separability results in abstract settings and partial characterizations for the linear separability of the family of all regular languages.