A kernel two-sample test

Authors:
Arthur Gretton;Karsten M. Borgwardt;Malte J. Rasch;Bernhard Schölkopf;Alexander Smola
Affiliations:
MPI for Intelligent Systems, Tübingen, Germany;Machine Learning and Computational Biology Research Group, Max Planck Institutes Tübingen, Tübingen, Germany;State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, P.R. China;MPI for Intelligent Systems, Tübingen, Germany;Yahoo! Research, Santa Clara, CA and The Australian National University, Canberra, ACT, Australia
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 35
Cited 2

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates

Journal of Multivariate Analysis
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
A minimum spanning tree algorithm with inverse-Ackermann type complexity

Journal of the ACM (JACM)
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Information Theoretic Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-Instance Kernels

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Sparse Greedy Matrix Approximation for Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the influence of the kernel on the consistency of support vector machines

The Journal of Machine Learning Research
Efficient svm training using low-rank kernel representations

The Journal of Machine Learning Research
Kernel independent component analysis

The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

The Journal of Machine Learning Research
Protein function prediction via graph kernels

Bioinformatics
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Integrating structured biological data by Kernel Maximum Mean Discrepancy

Bioinformatics
Kernel Methods for Measuring Independence

The Journal of Machine Learning Research
All of Nonparametric Statistics (Springer Texts in Statistics)

All of Nonparametric Statistics (Springer Texts in Statistics)
Nonparametric Quantile Estimation

The Journal of Machine Learning Research
Universal Kernels

The Journal of Machine Learning Research
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Tailoring density estimation via reproducing kernel moment matching

Proceedings of the 25th international conference on Machine learning
A Hilbert Space Embedding for Distributions

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Sample Selection Bias Correction Theory

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Support Vector Machines

Support Vector Machines
A kernel approach to comparing distributions

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Fast kernel-based independent component analysis

IEEE Transactions on Signal Processing
Hilbert Space Embeddings and Metrics on Probability Measures

The Journal of Machine Learning Research
Information, Divergence and Risk for Binary Experiments

The Journal of Machine Learning Research
Universality, Characteristic Kernels and RKHS Embedding of Measures

The Journal of Machine Learning Research
Measuring statistical dependence with hilbert-schmidt norms

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Maximum entropy distribution estimation with generalized regularization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Unifying divergence minimization and statistical inference via convex duality

COLT'06 Proceedings of the 19th annual conference on Learning Theory
On the asymptotic properties of a nonparametric L1-test statistic of homogeneity

IEEE Transactions on Information Theory

Querying discriminative and representative samples for batch mode active learning

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Geometric tree kernels: classification of COPD from airway tree geometry

IPMI'13 Proceedings of the 23rd international conference on Information Processing in Medical Imaging

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD).We present two distribution free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.