Characterizing and Exploiting Reference Locality in Data Stream Applications

Authors:
Feifei Li;Ching Chang;George Kollios;Azer Bestavros
Affiliations:
Boston University;Boston University;Boston University;Boston University
Venue:
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Year:
2006

Citing 0
Cited 8

Modeling skew in data streams

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding

IEEE Transactions on Knowledge and Data Engineering
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters

ACM SIGCOMM Computer Communication Review
Data-driven memory management for stream join

Information Systems
Small synopses for group-by query verification on outsourced data streams

ACM Transactions on Database Systems (TODS)
Evaluating top-k queries over incomplete data streams

Proceedings of the 18th ACM conference on Information and knowledge management
RRPJ: result-rate based progressive relational join

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Danaïdes: continuous and progressive complex queries on RSS feeds

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate a new approach to process queries in data stream applications. We show that reference locality characteristics of data streams could be exploited in the design of superior and flexible data stream query processing techniques. We identify two different causes of reference locality: popularity over long time scales and temporal correlations over shorter time scales. An elegant mathematical model is shown to precisely quantify the degree of those sources of locality. Furthermore, we analyze the impact of locality-awareness on achievable performance gains over traditional algorithms on applications such asMAX-subset approximate sliding window join and approximate count estimation. In a comprehensive experimental study, we compare several existing algorithms against our locality-aware algorithms over a number of real datasets. The results validate the usefulness and efficiency of our approach.