Complexity-penalized estimation of minimum volume sets for dependent data

Authors:
J. Di;E. Kolaczyk
Affiliations:
-;-
Venue:
Journal of Multivariate Analysis
Year:
2010

Citing 11
Cited 0

Elements of information theory

Elements of information theory
Choosability and fractional chromatic numbers

Proceedings of an international symposium on Graphs and combinatorics
Nonparametric conditional predictive regions for time series

Computational Statistics & Data Analysis
Diagnosing network-wide traffic anomalies

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
A Classification Framework for Anomaly Detection

The Journal of Machine Learning Research
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Learning Minimum Volume Sets

The Journal of Machine Learning Research
Consistency and Convergence Rates of One-Class SVMs and Related Algorithms

The Journal of Machine Learning Research
Using Local Dependencies within Batches to Improve Large Margin Classifiers

The Journal of Machine Learning Research
Minimum complexity regression estimation with weakly dependent observations

IEEE Transactions on Information Theory - Part 2
Minimax-optimal classification with dyadic decision trees

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

A minimum volume (MV) set, at level @a, is a set G"@a^* having minimum volume among all those sets containing at least @a probability mass. MV sets provide a natural notion of the 'central mass' of a distribution and, as such, have recently become popular as a tool for the detection of anomalies in multivariate data. Motivated by the fact that anomaly detection problems frequently arise in settings with temporally indexed measurements, we propose here a new method for the estimation of MV sets from dependent data. Our method is based on the concept of complexity-penalized estimation, extending recent work of Scott and Nowak for the case of independent and identically distributed measurements, and has both desirable theoretical properties and a practical implementation. Of particular note is the fact that, for a large class of stochastic processes, choice of an appropriate complexity penalty reduces to the selection of a single tuning parameter, which represents the data dependency of the underlying stochastic process. While in reality the dependence structure is unknown, we offer a data-dependent method for selecting this parameter, based on subsampling principles. Our work is motivated by and illustrated through an application to the detection of anomalous traffic levels in Internet traffic time series.