Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems (TODS)
On Similarity-Based Queries for Time Series Data
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Similarity Search Over Time-Series Data Using Wavelets
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Streaming-Data Algorithms for High-Quality Clustering
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
On demand classification of data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM SIGMOD Record
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Content-based crowd retrieval on the real-time web
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
A growing number of real world applications deal with multiple evolving data streams. In this paper, a framework for clustering over evolving data streams is proposed taking advantage of recent-biased approximation. In recent-biased approximation, more details are preserved for recent data and fewer coefficients are kept for the whole data stream, which improves the efficiency of clustering and space usability greatly. Our framework consists of two phases. One is an online phase which approximates data streams and maintains the summary statistics incrementally. The other is an offline clustering phase which is able to perform dynamic clustering over data streams on all possible time horizons. As shown in complexity analyses and also validated by our empirical studies, our framework performed efficiently in the data stream environment while producing clustering results of very high quality.