Precise anytime clustering of noisy sensor data with logarithmic complexity

Authors:
Marwan Hassani;Philipp Kranen;Thomas Seidl
Affiliations:
RWTH Aachen University, Germany;RWTH Aachen University, Germany;RWTH Aachen University, Germany
Venue:
Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data
Year:
2011

Citing 19
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Anytime Interval-Valued Outputs for Kernel Machines: Fast Support Vector Machine Classification via Distance Geometry

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive non-linear clustering in data streams

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Anytime measures for top-k algorithms

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Classifying under computational resource constraints: anytime classification using probabilistic estimators

Machine Learning
A Grid and Fractal Dimension-Based Data Stream Clustering Algorithm

ISISE '08 Proceedings of the 2008 International Symposium on Information Science and Engieering - Volume 01
Indexing density models for incremental learning and anytime classification on data streams

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Combining Multiple Interrelated Streams for Incremental Clustering

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Harnessing the strengths of anytime algorithms for constant data streams

Data Mining and Knowledge Discovery
Self-Adaptive Anytime Stream Clustering

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
MC-tree: Improving Bayesian anytime classification

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Polishing the Right Apple: Anytime Classification Also Benefits Data Streams with Constant Arrival Times

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Hierarchical clustering for real-time stream data with noise

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management

A single pass trellis-based algorithm for clustering evolving data streams

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Density-Based projected clustering of data streams

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering of streaming sensor data aims at providing online summaries of the observed stream. This task is mostly done under limited processing and storage resources. This makes the sensed stream speed (data per time) a sensitive restriction when designing stream clustering algorithms. Additionally, the varying speed of the stream is a natural characteristic of sensor data, e.g. changing the sampling rate upon detecting an event or for a certain time. In such cases, most clustering algorithms have to heavily restrict their model size such that they can handle the minimal time allowance. Recently the first anytime stream clustering algorithm has been proposed that flexibly uses all available time and dynamically adapts its model size. However, the method was not designed to precisely cluster sensor data which are usually noisy and extremely evolving. In this paper we detail the LiarTree algorithm that provides precise stream summaries and effectively handles noise, drift and novelty. We prove that the runtime of the LiarTree is logarithmic in the size of the maintained model opposed to a linear time complexity often observed in previous approaches. We demonstrate in an extensive experimental evaluation using synthetic and real sensor datasets that the LiarTree outperforms competing approaches in terms of the quality of the resulting summaries and exposes only a logarithmic time complexity.