Efficient trade-off between speed processing and accuracy in summarizing data streams

  • Authors:
  • Nesrine Gabsi;Fabrice Clérot;Georges Hébrail

  • Affiliations:
  • ,France Télécom RD, Lannion, France;France Télécom RD, Lannion, France;Institut TELECOM, TELECOM ParisTech, Paris, France

  • Venue:
  • PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data streams constitute the core of many traditional (e.g financial) and emerging (e.g environmental) applications The sources of streams are ubiquitous in daily life (e.g web clicks) One feature of these data is the high speed of their arrival Thus, their processing entails a special constraint Despite the exponential growth in the capacity of storage devices, it is very expensive - even impossible - to store a data stream in its entirety Consequently, queries are evaluated only on the recent data of the stream, the old ones are expired However, some applications need to query the whole data stream Therefore, the inability to store a complete stream suggests the storage of a compact representation of its data, called summaries These structures allow users to query the past without an explosion of the required storage space, to provide historical aggregated information, to perform data mining tasks or to detect anomalous behavior in computer systems The side effect of using summaries is that queries over historical data may not return exact answers, but only approximate ones. This paper introduces a new approach which is a trade-off between the accuracy of query results and the time consumed in building summaries.