Greed sort: optimal deterministic sorting on parallel disks
Journal of the ACM (JACM)
The reliability of queries (extended abstract)
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The complexity of query reliability
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Updating and Querying Databases that Track Mobile Units
Distributed and Parallel Databases - Special issue on mobile data management and applications
Introduction to Algorithms
The Management of Probabilistic Data
IEEE Transactions on Knowledge and Data Engineering
Capturing the Uncertainty of Moving-Object Representations
SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Evaluating probabilistic queries over imprecise data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive filters for continuous queries over distributed data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Cost-efficient processing of MIN/MAX queries over distributed sensors with uncertainty
Proceedings of the 2005 ACM symposium on Applied computing
Indexing multi-dimensional uncertain data with arbitrary probability density functions
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Towards correcting input data errors probabilistically using integrity constraints
MobiDE '06 Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Model-driven data acquisition in sensor networks
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient indexing methods for probabilistic threshold queries over uncertain data
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Probabilistic skylines on uncertain data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Monochromatic and bichromatic reverse skyline search over uncertain databases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient Processing of Top-k Queries in Uncertain Databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Probabilistic nearest-neighbor query on uncertain objects
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Evaluating probability threshold k-nearest-neighbor queries over uncertain data
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Qualitative effects of knowledge rules and user feedback in probabilistic data integration
The VLDB Journal — The International Journal on Very Large Data Bases
Creating probabilistic databases from duplicated data
The VLDB Journal — The International Journal on Very Large Data Bases
Missing data imputation: a fuzzy K-means clustering algorithm over sliding window
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 3
Querying and cleaning uncertain data
QuaCon'09 Proceedings of the 1st international conference on Quality of context
An abstract processing model for the quality of context data
QuaCon'09 Proceedings of the 1st international conference on Quality of context
Selective data acquisition for probabilistic K-NN query
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Data selection for exact value acquisition to improve uncertain clustering
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Explore or exploit?: effective strategies for disambiguating large databases
Proceedings of the VLDB Endowment
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Cleaning uncertain streams for query improvement
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Scrubbing query results from probabilistic databases
Proceedings of the 15th Symposium on International Database Engineering & Applications
Data-driven trajectory smoothing
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
An efficient method for cleaning dirty-events over uncertain data in WSNs
Journal of Computer Science and Technology - Special issue on Natural Language Processing
A decision tree-based missing value imputation technique for data pre-processing
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Causality and responsibility: probabilistic queries revisited in uncertain databases
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Entity resolution for distributed probabilistic data
Distributed and Parallel Databases
Hi-index | 0.00 |
Uncertain or imprecise data are pervasive in applications like location-based services, sensor monitoring, and data collection and integration. For these applications, probabilistic databases can be used to store uncertain data, and querying facilities are provided to yield answers with statistical confidence. Given that a limited amount of resources is available to "clean" the database (e.g., by probing some sensor data values to get their latest values), we address the problem of choosing the set of uncertain objects to be cleaned, in order to achieve the best improvement in the quality of query answers. For this purpose, we present the PWS-quality metric, which is a universal measure that quantifies the ambiguity of query answers under the possible world semantics. We study how PWS-quality can be efficiently evaluated for two major query classes: (1) queries that examine the satisfiability of tuples independent of other tuples (e.g., range queries); and (2) queries that require the knowledge of the relative ranking of the tuples (e.g., MAX queries). We then propose a polynomial-time solution to achieve an optimal improvement in PWS-quality. Other fast heuristics are presented as well. Experiments, performed on both real and synthetic datasets, show that the PWS-quality metric can be evaluated quickly, and that our cleaning algorithm provides an optimal solution with high efficiency. To our best knowledge, this is the first work that develops a quality metric for a probabilistic database, and investigates how such a metric can be used for data cleaning purposes.