Data skeletons: simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation

  • Authors:
  • James P. Mcdermott;G. Jogesh Babu;John C. Liechty;Dennis K. Lin

  • Affiliations:
  • Department of Statistics, The Pennsylvania State University, University Park, USA 16802;Department of Statistics, The Pennsylvania State University, University Park, USA 16802;Departments of Marketing and Statistics, The Pennsylvania State University, University Park, USA 16802;Department of Supply Chain and Information Systems, The Pennsylvania State University, University Park, USA 16802

  • Venue:
  • Statistics and Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of density estimation when the data is in the form of a continuous stream with no fixed length. In this setting, implementations of the usual methods of density estimation such as kernel density estimation are problematic. We propose a method of density estimation for massive datasets that is based upon taking the derivative of a smooth curve that has been fit through a set of quantile estimates. To achieve this, a low-storage, single-pass, sequential method is proposed for simultaneous estimation of multiple quantiles for massive datasets that form the basis of this method of density estimation. For comparison, we also consider a sequential kernel density estimator. The proposed methods are shown through simulation study to perform well and to have several distinct advantages over existing methods.