Detecting outliers on arbitrary data streams using anytime approaches
Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
MC-tree: Improving Bayesian anytime classification
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Precise anytime clustering of noisy sensor data with logarithmic complexity
Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data
An effective evaluation measure for clustering on evolving data streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical clustering for real-time stream data with noise
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Bulk loading hierarchical mixture models for efficient stream classification
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
AnyOut: anytime outlier detection on streaming data
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
A density-based clustering structure mining algorithm for data streams
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
BT*: an advanced algorithm for anytime classification
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Clustering spatial data streams for targeted alerting in disaster response
Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming
Hi-index | 0.00 |
Clustering streaming data requires algorithms which are capable of updating clustering results for the incoming data. As data is constantly arriving, time for processing is limited. Clustering has to be performed in a single pass over the incoming data and within the possibly varying inter-arrival times of the stream. Likewise, memory is limited, making it impossible to store all data. For clustering, we are faced with the challenge of maintaining a current result that can be presented to the user at any given time. In this work, we propose a parameter free algorithm that automatically adapts to the speed of the data stream. It makes best use of the time available under the current constraints to provide a clustering of the objects seen up to that point. Our approach incorporates the age of the objects to reflect the greater importance of more recent data. Moreover, we are capable of detecting concept drift, novelty and outliers in the stream. For efficient and effective handling, we introduce the ClusTree, a compact and self-adaptive index structure for maintaining stream summaries. Our experiments show that our approach is capable of handling a multitude of different stream characteristics for accurate and scalable anytime stream clustering.