BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive non-linear clustering in data streams
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Density-based clustering for real-time stream data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Anytime measures for top-k algorithms
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A Grid and Fractal Dimension-Based Data Stream Clustering Algorithm
ISISE '08 Proceedings of the 2008 International Symposium on Information Science and Engieering - Volume 01
Indexing density models for incremental learning and anytime classification on data streams
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Combining Multiple Interrelated Streams for Incremental Clustering
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers
Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Harnessing the strengths of anytime algorithms for constant data streams
Data Mining and Knowledge Discovery
Self-Adaptive Anytime Stream Clustering
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
MC-tree: Improving Bayesian anytime classification
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Hierarchical clustering for real-time stream data with noise
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
A single pass trellis-based algorithm for clustering evolving data streams
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Density-Based projected clustering of data streams
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Hi-index | 0.00 |
Clustering of streaming sensor data aims at providing online summaries of the observed stream. This task is mostly done under limited processing and storage resources. This makes the sensed stream speed (data per time) a sensitive restriction when designing stream clustering algorithms. Additionally, the varying speed of the stream is a natural characteristic of sensor data, e.g. changing the sampling rate upon detecting an event or for a certain time. In such cases, most clustering algorithms have to heavily restrict their model size such that they can handle the minimal time allowance. Recently the first anytime stream clustering algorithm has been proposed that flexibly uses all available time and dynamically adapts its model size. However, the method was not designed to precisely cluster sensor data which are usually noisy and extremely evolving. In this paper we detail the LiarTree algorithm that provides precise stream summaries and effectively handles noise, drift and novelty. We prove that the runtime of the LiarTree is logarithmic in the size of the maintained model opposed to a linear time complexity often observed in previous approaches. We demonstrate in an extensive experimental evaluation using synthetic and real sensor datasets that the LiarTree outperforms competing approaches in terms of the quality of the resulting summaries and exposes only a logarithmic time complexity.