Online clustering of parallel data streams

Authors:
Jürgen Beringer;Eyke Hüllermeier
Affiliations:
Fakultät für Informatik, Otto-von-Guericke-Universität, Magdeburg, Germany;Fakultät für Informatik, Otto-von-Guericke-Universität, Magdeburg, Germany
Venue:
Data & Knowledge Engineering
Year:
2006

Citing 24
Cited 32

Unsupervised Optimal Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Characterization and detection of noise in clustering

Pattern Recognition Letters
Time series similarity measures and time series indexing (abstract only)

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Clustering Algorithms

Clustering Algorithms
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Querying and mining data streams: you only get one look a tutorial

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Estimating Rarity and Similarity over Data Stream Windows

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Issues in data stream management

ACM SIGMOD Record
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Approximate join processing over data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Cost-efficient mining techniques for data streams

ACSW Frontiers '04 Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32
Approximate Aggregation Techniques for Sensor Databases

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Clustering Large Graphs via the Singular Value Decomposition

Machine Learning
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Adaptive, hands-off stream mining

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Classification spanning correlated data streams

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Extensions of vector quantization for incremental clustering

Pattern Recognition
Clustering over Multiple Evolving Streams by Events and Correlations

IEEE Transactions on Knowledge and Data Engineering
A dynamic data granulation through adjustable fuzzy clustering

Pattern Recognition Letters
Incremental clustering of dynamic data streams using connectivity based representative points

Data & Knowledge Engineering
Neighbor-based pattern detection for windows over streaming data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
NP-hardness of Euclidean sum-of-squares clustering

Machine Learning
Privacy-preserving data publishing for cluster analysis

Data & Knowledge Engineering
Stream data clustering based on grid density and attraction

ACM Transactions on Knowledge Discovery from Data (TKDD)
Combining Multiple Interrelated Streams for Incremental Clustering

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Robust Division in Clustering of Streaming Time Series

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Stream Clustering of Growing Objects

DS '09 Proceedings of the 12th International Conference on Discovery Science
Semi-fuzzy splitting in online divisive-agglomerative clustering

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
MG-join: detecting phenomena and their correlation in high dimensional data streams

Distributed and Parallel Databases
Parallel processing of continuous queries over data streams

Distributed and Parallel Databases
Describing data with the support vector shell in distributed environments

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
A novel clustering method on time series data

Expert Systems with Applications: An International Journal
L2GClust: local-to-global clustering of stream sources

Proceedings of the 2011 ACM Symposium on Applied Computing
A clustering algorithm for multiple data streams based on spectral component similarity

Information Sciences: an International Journal
CLUES: a unified framework supporting interactive exploration of density-based clusters in streams

Proceedings of the 20th ACM international conference on Information and knowledge management
SIC-means: a semi-fuzzy approach for clustering data streams using c-means

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Evidential evolving Gustafson--Kessel algorithm for online data streams partitioning using belief function theory

International Journal of Approximate Reasoning
Editorial: Large scale instance selection by means of federal instance selection

Data & Knowledge Engineering
On employing fuzzy modeling algorithms for the valuation of residential premises

Information Sciences: an International Journal
Mining neighbor-based patterns in data streams

Information Systems
A single pass algorithm for clustering evolving data streams based on swarm intelligence

Data Mining and Knowledge Discovery
On online high-dimensional spherical data clustering and feature selection

Engineering Applications of Artificial Intelligence
An ensemble clustering model for mining concept drifting stream data in emergency management

DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop
Warped K-Means: An algorithm to cluster sequentially-distributed data

Information Sciences: an International Journal
Stock market co-movement assessment using a three-phase clustering method

Expert Systems with Applications: An International Journal
On clustering large number of data streams

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed systems, database systems, and data mining. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In this paper, we consider the problem of clustering parallel streams of real-valued data, that is to say, continuously evolving time series. In other words, we are interested in grouping data streams the evolution over time of which is similar in a specific sense. In order to maintain an up-to-date clustering structure, it is necessary to analyze the incoming data in an online manner, tolerating not more than a constant time delay. For this purpose, we develop an efficient online version of the classical K-means clustering algorithm. Our method's efficiency is mainly due to a scalable online transformation of the original data which allows for a fast computation of approximate distances between streams.