A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems

Authors:
Ran Wolff;Kanishka Bhaduri;Hillol Kargupta
Affiliations:
Haifa University, Haifa;University of Maryland, Baltimore County, Baltimore;University of Maryland, Baltimore County, Baltimore
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2009

Citing 0
Cited 4

DPSP: distributed progressive sequential pattern mining on the cloud

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
2013 Special Issue: Detecting and preventing error propagation via competitive learning

Neural Networks
Spatio-temporal random fields: compressible representation and distributed estimation

Machine Learning
GoSCAN: Decentralized scalable data clustering

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the \emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, $k$-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient \emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its $k$-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.