Stochastic dominance and expected utility: survey and analysis
Management Science
Discrete-time signal processing (2nd ed.)
Discrete-time signal processing (2nd ed.)
The Earth Mover's Distance as a Metric for Image Retrieval
International Journal of Computer Vision
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Locally adaptive dimensionality reduction for indexing large time series databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast Time Sequence Indexing for Arbitrary Lp Norms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The Earth Mover''s Distance: Lower Bounds and Invariance under Translation
The Earth Mover''s Distance: Lower Bounds and Invariance under Translation
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
On the impossibility of dimension reduction in l1
Journal of the ACM (JACM)
A new Mallows distance based metric for comparing clusterings
ICML '05 Proceedings of the 22nd international conference on Machine learning
Approximation Techniques for Indexing the Earth Mover's Distance in Multimedia Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient similarity search using the Earth Mover's Distance for large multimedia databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient and effective similarity search over probabilistic data based on earth mover's distance
Proceedings of the VLDB Endowment
Indexing spatially sensitive distance measures using multi-resolution lower bounds
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Statistical Timing Analysis With Coupling
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
Querying uncertain data sets (represented as probability distributions) presents many challenges due to the large amount of data involved and the difficulties comparing uncertainty between distributions. The Earth Mover's Distance (EMD) has increasingly been employed to compare uncertain data due to its ability to effectively capture the differences between two distributions. Computing the EMD entails finding a solution to the transportation problem, which is computationally intensive. In this paper, we propose a new lower bound to the EMD and an index structure to significantly improve the performance of EMD based K-- nearest neighbor (K--NN) queries on uncertain databases. We propose a new lower bound to the EMD that approximates the EMD on a projection vector. Each distribution is projected onto a vector and approximated by a normal distribution, as well as an accompanying error term. We then represent each normal as a point in a Hough transformed space. We then use the concept of stochastic dominance to implement an efficient index structure in the transformed space. We show that our method significantly decreases K--NN query time on uncertain databases. The index structure also scales well with database cardinality. It is well suited for heterogeneous data sets, helping to keep EMD based queries tractable as uncertain data sets become larger and more complex.