Mining data streams with periodically changing distributions
Proceedings of the 18th ACM conference on Information and knowledge management
PODS: a new model and processing algorithms for uncertain data streams
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A*-tree: a structure for storage and modeling of uncertain multidimensional arrays
Proceedings of the VLDB Endowment
Conditioning and aggregating uncertain data streams: going beyond expectations
Proceedings of the VLDB Endowment
CLARO: modeling and processing uncertain data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Xtream: a system for continuous querying over uncertain data streams
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Query execution timing: taming real-time anytime queries on multicore processors
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Scientific and intelligence applications have special data handling needs. In these settings, data does not fit the standard model of short coded records that had dominated the data management area for three decades. Array database systems have a specialized architecture to address this problem. Since the data is typically an approximation of reality, it is important to be able to handle imprecision and uncertainty in an efficient and provably accurate way. We propose a discrete approach for value distributions and adopt a standard metric (i.e., variation distance) in probability theory to measure the quality of a result distribution. We then propose a novel algorithm that has a provable upper bound on the variation distance between its result distribution and the "ideal" one. Complementary to that, we advocate the usage of a "statistical mode" suitable for the results of many queries and applications, which is also much more efficient for execution. We show how the statistical mode also presents interesting predicate evaluation strategies. In addition, extensive experiments are performed on real world datasets to evaluate our algorithms.