Window join approximation over data streams with importance semantics

Authors:
Adegoke Ojewole;Qiang Zhu;Wen-Chi Hou
Affiliations:
University of Michigan, Dearborn, Dearborn, MI;University of Michigan, Dearborn, Dearborn, MI;Southern Illinois University, Carbondale, IL
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 15
Cited 3

An efficient implementation of a scaling minimum-cost flow algorithm

Journal of Algorithms
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Hancock: a language for extracting signatures from data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Dataflow query execution in a parallel main-memory environment

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Rate-based query optimization for streaming information sources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Histogramming Data Streams with Fast Per-Item Processing

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Exploiting Punctuation Semantics in Continuous Data Streams

IEEE Transactions on Knowledge and Data Engineering
Issues in data stream management

ACM SIGMOD Record
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Semantic Approximation of Data Stream Joins

IEEE Transactions on Knowledge and Data Engineering
Maximizing the output rate of multi-way join queries over streaming information sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Window query processing for joining data streams with relations

CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Transformation of continuous aggregation join queries over data streams

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Processing exact results for sliding window joins over data streams using disk storage

International Journal of Intelligent Information and Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Load shedding techniques generate approximate sliding window join results when memory constraints prevent exact computation. The previously proposed random load shedding method drops input tuples without consideration for the number of outputs created, while the recently proposed semantic load shedding technique aims to produce the largest possible result set. We consider a new model in which data stream tuples contain numerical importance values relevant to the query source and seek to maximize the importance of the approximate join result. We show that both random load shedding and semantic load shedding are sub-optimal in this situation, while the techniques presented in this paper satisfy the objective function by considering both tuple importance and join attribute distributions. We extend the existing offline semantic approximation technique to make it compatible with our objective function and show that it is less space and time efficient than our new optimal offline algorithm for small and large join memory allotments. We also introduce four efficient online algorithms, which are quite promising in maximizing the importance of the approximate join result without foreknowledge of input streams.