Dynamic adaptive data structures for monitoring data streams

Authors:
J. Aguilar-Saborit;P. Trancoso;V. Muntes-Mulero;J. L. Larriba-Pey
Affiliations:
IBM Toronto Laboratory, 8200 Warden Avenue, Markham, ON, Canada L6G1C7;Department of Computer Science, University of Cyprus, Nicosia, Cyprus;DAMA-UPC, Computer Architecture Department, Universitat Politecnica de Catalunya, Spain;DAMA-UPC, Computer Architecture Department, Universitat Politecnica de Catalunya, Spain
Venue:
Data & Knowledge Engineering
Year:
2008

Citing 24
Cited 2

A performance analysis of the gamma database machine

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Practical performance of Bloom filters and parallel free-text searching

Communications of the ACM
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using Semi-Joins to Solve Relational Queries

Journal of the ACM (JACM)
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
A second look at bloom filters

Communications of the ACM
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Compressed bloom filters

IEEE/ACM Transactions on Networking (TON)
Continuous queries over data streams

ACM SIGMOD Record
Sort-Merge-Join: An Idea Whose Time Has(h) Passed?

Proceedings of the Tenth International Conference on Data Engineering
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
R* Optimizer Validation and Performance Evaluation for Distributed Queries

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Generalised Hash Teams for Join and Group-by

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
On applying hash filters to improving the execution of multi-join queries

The VLDB Journal — The International Journal on Very Large Data Bases
Longest prefix matching using bloom filters

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Spectral bloom filters

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Join-distinct aggregate estimation over update streams

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dynamic count filters

ACM SIGMOD Record
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Bloom histogram: path selectivity estimation for XML data with updates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Hybrid in-memory and on-disk tables for speeding-up table accesses

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Suppressing redundancy in wireless sensor network traffic

DCOSS'10 Proceedings of the 6th IEEE international conference on Distributed Computing in Sensor Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The monitoring of data streams is a very important issue in many different areas. Aspects such as accuracy, the speed of response, the use of memory and the adaptability to the changing nature of data may vary in importance depending on the situation. Examples such as Web page access monitoring, approximate aggregation in relational queries or IP message routing are clear examples of a varied range of those needs. There are different data structures that deal with this problem such as the counting bloom filters, the spectral bloom filters and the dynamic count filters. Those data structures range from static to complex dynamic representations of the data stream that keep an approximate count of the number of occurrences for each data value. In this paper, we focus on three main aspects. First, we analyze the problem in perspective and review the existing static and dynamic solutions. Second, we propose and analyze in depth a simple yet powerful partitioning strategy that reinforces the advantages of the methods proposed up to now solving most of their drawbacks. Finally, using real executions and mathematical models, we evaluate the existing methods alone and in combination with our partitioning strategy. We show that with our partitioning strategy, it is possible to reduce the memory requirements and average response time, improving the adaptiveness to changing data characteristics and leaving the accuracy of the partitioned dynamic data structures intact.