Clustering distributed data streams in peer-to-peer environments

Authors:
Sanghamitra Bandyopadhyay;Chris Giannella;Ujjwal Maulik;Hillol Kargupta;Kun Liu;Souptik Datta
Affiliations:
Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, United States;Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, United States;Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, United States;Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, United States;Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, United States;Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, United States
Venue:
Information Sciences: an International Journal
Year:
2006

Citing 35
Cited 21

Algorithms for clustering data

Algorithms for clustering data
Next century challenges: scalable coordination in sensor networks

MobiCom '99 Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking
Wireless integrated network sensors

Communications of the ACM
Algorithmic transforms for efficient energy scalable computation

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Directed diffusion: a scalable and robust communication paradigm for sensor networks

MobiCom '00 Proceedings of the 6th annual international conference on Mobile computing and networking
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
System architecture directions for networked sensors

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Distributed data clustering can be efficient and exact

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed clustering using collective principal component analysis

Knowledge and Information Systems
Wireless sensor networks for habitat monitoring

WSNA '02 Proceedings of the 1st ACM international workshop on Wireless sensor networks and applications
Wireless sensor networks: a survey

Computer Networks: The International Journal of Computer and Telecommunications Networking
RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Data Gathering Algorithms in Sensor Networks Using Energy Metrics

IEEE Transactions on Parallel and Distributed Systems
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Collective, Hierarchical Clustering from Distributed, Heterogeneous Data

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Energy-Efficient Communication Protocol for Wireless Microsensor Networks

HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 8 - Volume 8
Data Clustering Using Evidence Accumulation

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving Distributed Clustering using Generative Models

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Distributed deviation detection in sensor networks

ACM SIGMOD Record
Distributed Bayesian Algorithms for Fault-Tolerant Event Region Detection in Wireless Sensor Networks

IEEE Transactions on Computers
Dynamic Clustering for Acoustic Target Tracking in Wireless Sensor Networks

IEEE Transactions on Mobile Computing
HEED: A Hybrid, Energy-Efficient, Distributed Clustering Approach for Ad Hoc Sensor Networks

IEEE Transactions on Mobile Computing
Scalable density-based distributed clustering

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Distributed clustering based on sampling local density estimates

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An application-specific protocol architecture for wireless microsensor networks

IEEE Transactions on Wireless Communications
Association rule mining in peer-to-peer systems

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A survey on sensor networks

IEEE Communications Magazine

PENS: an algorithm for density-based clustering in peer-to-peer systems

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed prediction from vertically partitioned data

Journal of Parallel and Distributed Computing
Robust clustering analysis for the management of self-monitoring distributed systems

Cluster Computing
Multimodal analysis of body sensor network data streams for real-time healthcare

Proceedings of the international conference on Multimedia information retrieval
Distributed routing in wireless sensor networks using energy welfare metric

Information Sciences: an International Journal
Managing power conservation in wireless networks

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
L2GClust: local-to-global clustering of stream sources

Proceedings of the 2011 ACM Symposium on Applied Computing
PROBABILISTIC HEURISTICS FOR HIERARCHICAL WEB DATA CLUSTERING

Computational Intelligence
A Sequential Sampling Framework for Spectral k-Means Based on Efficient Bootstrap Accuracy Estimations: Application to Distributed Clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Design and evaluation of decentralized online clustering

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Mining network relationships in the internet of things

Proceedings of the 2012 international workshop on Self-aware internet of things
Review: Future internet and the agri-food sector: State-of-the-art in literature and research

Computers and Electronics in Agriculture
DS-means: distributed data stream clustering

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Mining neighbor-based patterns in data streams

Information Systems
A single pass algorithm for clustering evolving data streams based on swarm intelligence

Data Mining and Knowledge Discovery
Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

Journal of Parallel and Distributed Computing
High order pLSA for indexing tagged images

Signal Processing
Evolutionary k-means for distributed data sets

Neurocomputing
Hyperspherical cluster based distributed anomaly detection in wireless sensor networks

Journal of Parallel and Distributed Computing
Robust estimation of a global Gaussian mixture by decentralized aggregations of local models

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.07

Visualization

Abstract

This paper describes a technique for clustering homogeneously distributed data in a peer-to-peer environment like sensor networks. The proposed technique is based on the principles of the K-Means algorithm. It works in a localized asynchronous manner by communicating with the neighboring nodes. The paper offers extensive theoretical analysis of the algorithm that bounds the error in the distributed clustering process compared to the centralized approach that requires downloading all the observed data to a single site. Experimental results show that, in contrast to the case when all the data is transmitted to a central location for application of the conventional clustering algorithm, the communication cost (an important consideration in sensor networks which are typically equipped with limited battery power) of the proposed approach is significantly smaller. At the same time, the accuracy of the obtained centroids is high and the number of samples which are incorrectly labeled is also small.