ProbView: a flexible probabilistic database system
ACM Transactions on Database Systems (TODS)
Optimal multi-step k-nearest neighbor search
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Earth Mover's Distance as a Metric for Image Retrieval
International Journal of Computer Vision
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
A Metric for Distributions with Applications to Image Databases
ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Managing uncertainty in moving objects databases
ACM Transactions on Database Systems (TODS)
U-DBMS: a database system for managing constantly-evolving data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximation Techniques for Indexing the Earth Mover's Distance in Multimedia Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
An Efficient Earth Mover's Distance Algorithm for Robust Histogram Comparison
IEEE Transactions on Pattern Analysis and Machine Intelligence
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Range search on multidimensional uncertain data
ACM Transactions on Database Systems (TODS)
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Earth mover distance over high-dimensional spaces
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Ranking queries on uncertain data: a probabilistic threshold approach
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Probabilistic top-k and ranking-aggregate queries
ACM Transactions on Database Systems (TODS)
Probabilistic Event Extraction from RFID Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Similarity search on Bregman divergence: towards non-metric indexing
Proceedings of the VLDB Endowment
A unified approach to ranking in probabilistic databases
Proceedings of the VLDB Endowment
Efficient exact edit similarity query processing with the asymmetric signature scheme
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Top-K probabilistic closest pairs query in uncertain spatial databases
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Indexing the earth mover's distance using normal distributions
Proceedings of the VLDB Endowment
WS-Finder: a framework for similarity search of web services
ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
Asymmetric signature schemes for efficient exact edit similarity query processing
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
Probabilistic data is coming as a new deluge along with the technical advances on geographical tracking, multimedia processing, sensor network and RFID. While similarity search is an important functionality supporting the manipulation of probabilistic data, it raises new challenges to traditional relational database. The problem stems from the limited effectiveness of the distance metric supported by the existing database system. On the other hand, some complicated distance operators have proven their values for better distinguishing ability in the probabilistic domain. In this paper, we discuss the similarity search problem with the Earth Mover's Distance, which is the most successful distance metric on probabilistic histograms and an expensive operator with cubic complexity. We present a new database approach to answer range queries and k-nearest neighbor queries on probabilistic data, on the basis of Earth Mover's Distance. Our solution utilizes the primal-dual theory in linear programming and deploys B+ tree index structures for effective candidate pruning. Extensive experiments show that our proposal dramatically improves the scalability of probabilistic databases.