The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations
Communications of the ACM
Sequential procedure for simultaneous estimation of several percentiles
Transactions of the Society for Computer Simulation International
Wide area traffic: the failure of Poisson modeling
IEEE/ACM Transactions on Networking (TON)
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Incremental quantile estimation for massive tracking
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Single-pass low-storage arbitrary quantile estimation for massive datasets
Statistics and Computing
Themis: an I/O-efficient MapReduce
Proceedings of the Third ACM Symposium on Cloud Computing
Hi-index | 0.00 |
We consider the problem of density estimation when the data is in the form of a continuous stream with no fixed length. In this setting, implementations of the usual methods of density estimation such as kernel density estimation are problematic. We propose a method of density estimation for massive datasets that is based upon taking the derivative of a smooth curve that has been fit through a set of quantile estimates. To achieve this, a low-storage, single-pass, sequential method is proposed for simultaneous estimation of multiple quantiles for massive datasets that form the basis of this method of density estimation. For comparison, we also consider a sequential kernel density estimator. The proposed methods are shown through simulation study to perform well and to have several distinct advantages over existing methods.