Merging multiple data streams on common keys over high performance networks

Authors:
Marco Mazzucco;Asvin Ananthanarayan;Robert L. Grossman;Jorge Levera;Gokulnath Bhagavantha Rao
Affiliations:
University of Illinois at Chicago;University of Illinois at Chicago;University of Illinois at Chicago;University of Illinois at Chicago;University of Illinois at Chicago
Venue:
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Year:
2002

Citing 9
Cited 5

Continuous queries over append-only databases

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Papyrus: a system for data mining over local and wide area clusters and super-clusters

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Continual Queries for Internet Scale Event-Driven Information Delivery

IEEE Transactions on Knowledge and Data Engineering
Online Dynamic Reordering for Interactive Data Processing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
An Evaluation of Non-Equijoin Algorithms

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Generalized Search Trees for Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The Design and Implementation of a Sequence Database System

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
PSockets: The Case for Application-level Network Striping for Data Intensive Applications using High Speed Wide Area Networks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing

Experimental studies using photonic data services at IGrid 2002

Future Generation Computer Systems - iGrid 2002
Experiences in Design and Implementation of a High Performance Transport Protocol

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Supporting dynamic migration in tightly coupled grid applications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Research issues in mining multiple data streams

Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
Supporting self-adaptation in streaming data mining applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The model for data mining on streaming data assumes that there is a buffer of fixed length and a data stream of infinite length and the challenge is to extract patterns, changes, anomalies, and statistically significant structures by examining the data one time and storing records and derived attributes of length less than N. As data grids, data webs, and semantic webs become more common, mining distributed streaming data will become more and more important. The first step when presented with two or more distributed streams is to merge them using a common key. In this paper, we present two algorithms for merging streaming data using a common key. We also present experimental studies showing these algorithms scale in practice to OC-12 networks.