A binary-classification-based metric between time-series distributions and its use in statistical and learning problems

Authors:
Daniil Ryabko;Jérémie Mary
Affiliations:
SequeL-INRIA, LIFL-CNRS, Université de Lille, Villeneuve d'Ascq, France;SequeL-INRIA, LIFL-CNRS, Université de Lille, Villeneuve d'Ascq, France
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 10
Cited 0

Support-Vector Networks

Machine Learning
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Compression-based methods for nonparametric prediction and estimation of some characteristics of time series

IEEE Transactions on Information Theory
Robust reductions from ranking to classification

COLT'07 Proceedings of the 20th annual conference on Learning theory
Nonparametric statistical inference for ergodic processes

IEEE Transactions on Information Theory
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem

The Journal of Machine Learning Research
Asymptotically optimal classification for multiple tests with empirically observed statistics

IEEE Transactions on Information Theory
Complexity-based induction systems: Comparisons and convergence theorems

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

A metric between time-series distributions is proposed that can be evaluated using binary classification methods, which were originally developed to work on i.i.d. data. It is shown how this metric can be used for solving statistical problems that are seemingly unrelated to classification and concern highly dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. Universal consistency of the resulting algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data.