Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Self-spacial join selectivity estimation using fractal concepts
ACM Transactions on Information Systems (TOIS)
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
F4: large-scale automated forecasting using fractals
Proceedings of the eleventh international conference on Information and knowledge management
Using Self-Similarity to Cluster Large Data Sets
Data Mining and Knowledge Discovery
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A framework for diagnosing changes in evolving data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Accurate decision trees for mining high-speed data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The time diversification monitoring of a stock portfolio: an approach based on the fractal dimension
Proceedings of the 2004 ACM symposium on Applied computing
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Measuring evolving data streams' behavior through their intrinsic dimension
New Generation Computing
SRF: a framework for the study of classifier behavior under training set mislabeling noise
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Hi-index | 0.00 |
Data streams are fundamental in several data processing applications involving large amount of data generated continuously as a sequence of events. Frequently, such events are not stored, so the data is analyzed and queried as they arrive and discarded right away. In many applications these events are represented by a predetermined number of numerical attributes. Thus, without loss of generality, we can consider events as elements from a dimensional domain. A sequence of events in a data stream can be characterized by its intrinsic dimension, which in dimensional datasets is usually lower than the embedding dimensionality. As the intrinsic dimension can be used to improve the performance of algorithms handling dimensional data (specially query optimization) measuring it is relevant to improve data streams processing and analysis as well. Moreover, it can also be useful to forecast data behavior. Hence, we present an algorithm able to measure the intrinsic dimension of a data stream on the fly, following its continuously changing behavior. We also present experimental studies, using both real and synthetic data streams, showing that the results on well-understood datasets closely follow what is expected from the known behavior of the data.