Distribution kernels based on moments of counts

Authors:
Corinna Cortes;Mehryar Mohri
Affiliations:
Google Labs, Broadway, New York, NY;AT&T Labs - Research, Shannon Laboratory, Park Avenue, Florham Park, NJ
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 5
Cited 2

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
A design principles of a weighted finite-state transducer library

Theoretical Computer Science - Special issue on implementing automata
Finite-state transducers in language and speech processing

Computational Linguistics

Rational Kernels: Theory and Algorithms

The Journal of Machine Learning Research
A general weighted grammar library

CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications in text and speech processing require the analysis of distributions of variable-length sequences. We recently introduced a general kernel framework, rational kernels, to extend kernel methods to the analysis of such variable-length sequences or more generally weighted automata. These kernels are efficient to compute and have been successfully used in applications such as spoken-dialog classification using Support Vector Machines.However, the rational kernels previously introduced do not fully encompass distributions over alternate sequences. Prior similarity measures between two weighted automata are based only on the expected counts of co-occurring subsequences and ignore similarities (or dissimilarities) in higher order moments of the distributions of these counts.In this paper, we introduce a new family of rational kernels, moment kernels, that precisely exploit this additional information. These kernels are distribution kernels based on moments of counts of strings. We describe efficient algorithms to compute moment kernels and apply them to several difficult spoken-dialog classification tasks. Our experiments show that using the second moment of the counts of n-gram sequences consistently improves the classification accuracy in these tasks.