Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Self-spacial join selectivity estimation using fractal concepts
ACM Transactions on Information Systems (TOIS)
Spatial join selectivity using power laws
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A cost model for query processing in high dimensional data spaces
ACM Transactions on Database Systems (TODS)
Tri-plots: scalable tools for multidimensional data mining
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
F4: large-scale automated forecasting using fractals
Proceedings of the eleventh international conference on Information and knowledge management
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Continual Queries for Internet Scale Event-Driven Information Delivery
IEEE Transactions on Knowledge and Data Engineering
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees
IEEE Transactions on Knowledge and Data Engineering
Using Self-Similarity to Cluster Large Data Sets
Data Mining and Knowledge Discovery
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Exploiting Punctuation Semantics in Continuous Data Streams
IEEE Transactions on Knowledge and Data Engineering
Issues in data stream management
ACM SIGMOD Record
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Bursty and Hierarchical Structure in Streams
Data Mining and Knowledge Discovery
A framework for diagnosing changes in evolving data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient elastic burst detection in data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The time diversification monitoring of a stock portfolio: an approach based on the fractal dimension
Proceedings of the 2004 ACM symposium on Applied computing
Optimization of query streams using semantic prefetching
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adaptive ordering of pipelined stream filters
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Dynamic plan migration for continuous queries over data streams
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On demand classification of data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Interval query indexing for efficient stream processing
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Next Generation of Data-Mining Applications
Next Generation of Data-Mining Applications
Adaptive Caching for Continuous Queries
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Semantics and evaluation techniques for window aggregates in data streams
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Learning decision trees from dynamic data streams
Proceedings of the 2005 ACM symposium on Applied computing
Evaluating the intrinsic dimension of evolving data streams
Proceedings of the 2006 ACM symposium on Applied computing
Detecting change in data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
XWAVE: optimal and approximate extended wavelets
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Memory-limited execution of windowed stream joins
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Resource sharing in continuous sliding-window aggregates
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Query languages and data models for database sequences and data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A framework for projected clustering of high dimensional data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Semantics of data streams and operators
ICDT'05 Proceedings of the 10th international conference on Database Theory
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Querying sliding windows over online data streams
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
To be or not to be real: fractal analysis of data streams from a regional climate change model
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
The dimension of a dataset has major impact on database management, such as indexing and querying processing. The embedding dimension (i.e., the number of attributes of the dataset) usually overestimates the actual contribution of the attributes to the main characteristics of the data, as the typical assumption of uniform distribution and independence between attributes usually does not hold. In fact, due to dependencies and attribute correlations, real data are typically skewed and exhibit intrinsic dimensionality much lower than the embedding dimension. Similarly, the intrinsic dimension can also be applied to improve data stream processing and analysis. Data streams are generated as sequences of events represented by a predetermined number of numerical attributes. Thus, without loss of generality, we can consider events as elements from a dimensional domain. This paper presents a fast, linear algorithm to measure the intrinsic dimension of a data stream on the fly, following its continuously changing behavior. Experimental studies show that the intrinsic dimension can be used to analyze attribute correlations. The results on well-understood datasets closely follow what is expected from the known behavior of the data.