Efficient trade-off between speed processing and accuracy in summarizing data streams

Authors:
Nesrine Gabsi;Fabrice Clérot;Georges Hébrail
Affiliations:
,France Télécom RD, Lannion, France;France Télécom RD, Lannion, France;Institut TELECOM, TELECOM ParisTech, Paris, France
Venue:
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Year:
2010

Citing 8
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Issues in data stream management

ACM SIGMOD Record
Offline and data stream algorithms for efficient computation of synopsis structures

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Sketching probabilistic data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Sampling time-based sliding windows in bounded space

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
An Efficient Algorithm for Optimal Multilevel Thresholding of Irregularly Sampled Histograms

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Exponentially Decayed Aggregates on Data Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data streams constitute the core of many traditional (e.g financial) and emerging (e.g environmental) applications The sources of streams are ubiquitous in daily life (e.g web clicks) One feature of these data is the high speed of their arrival Thus, their processing entails a special constraint Despite the exponential growth in the capacity of storage devices, it is very expensive - even impossible - to store a data stream in its entirety Consequently, queries are evaluated only on the recent data of the stream, the old ones are expired However, some applications need to query the whole data stream Therefore, the inability to store a complete stream suggests the storage of a compact representation of its data, called summaries These structures allow users to query the past without an explosion of the required storage space, to provide historical aggregated information, to perform data mining tasks or to detect anomalous behavior in computer systems The side effect of using summaries is that queries over historical data may not return exact answers, but only approximate ones. This paper introduces a new approach which is a trade-off between the accuracy of query results and the time consumed in building summaries.