Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream

  • Authors:
  • Xuemin Lin;Hongjun Lu;Jian Xu;Jeffrey Xu Yu

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistics over the most recently observed data elementsare often required in applications involving data streams,such as intrusion detection in network monitoring, stockprice prediction in financial markets, web log mining foraccess prediction, and user click stream mining for personalization.Among various statistics, computing quantilesummary is probably most challenging because of its complexity.In this paper, we study the problem of continuouslymaintaining quantile summary of the most recentlyobserved N elements over a stream so that quantile queriescan be answered with a guaranteed precision of 驴N.Wedeveloped a space efficient algorithm for pre-defined Nthat requires only one scan of the input data stream andO({{\log ( \in ^2 N)} \over\in } + {1 \over { \in ^2 }}) space in the worst cases.We alsodeveloped an algorithm that maintains quantile summaries formost recent N elements so that quantile queries on any mostrecent n elements (n 驴 N) can be answered with a guaranteedprecision of 驴n.The worst case space requirement forthis algorithm is only O({{\log ^2 ( \in N)} \over { \in ^2 }}).Our performance studyindicated that not only the actual quantile estimation erroris far below the guaranteed precision but the space requirementis also much less than the given theoretical bound.