BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Issues in data stream management
ACM SIGMOD Record
Offline and data stream algorithms for efficient computation of synopsis structures
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Sketching probabilistic data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Sampling time-based sliding windows in bounded space
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
An Efficient Algorithm for Optimal Multilevel Thresholding of Irregularly Sampled Histograms
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Exponentially Decayed Aggregates on Data Streams
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Hi-index | 0.00 |
Data streams constitute the core of many traditional (e.g financial) and emerging (e.g environmental) applications The sources of streams are ubiquitous in daily life (e.g web clicks) One feature of these data is the high speed of their arrival Thus, their processing entails a special constraint Despite the exponential growth in the capacity of storage devices, it is very expensive - even impossible - to store a data stream in its entirety Consequently, queries are evaluated only on the recent data of the stream, the old ones are expired However, some applications need to query the whole data stream Therefore, the inability to store a complete stream suggests the storage of a compact representation of its data, called summaries These structures allow users to query the past without an explosion of the required storage space, to provide historical aggregated information, to perform data mining tasks or to detect anomalous behavior in computer systems The side effect of using summaries is that queries over historical data may not return exact answers, but only approximate ones. This paper introduces a new approach which is a trade-off between the accuracy of query results and the time consumed in building summaries.