A probabilistic relational algebra for the integration of information retrieval and database systems
ACM Transactions on Information Systems (TOIS)
The Management of Probabilistic Data
IEEE Transactions on Knowledge and Data Engineering
The Theory of Probabilistic Databases
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Evaluating probabilistic queries over imprecise data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management
The VLDB Journal — The International Journal on Very Large Data Bases
Highly available, fault-tolerant, parallel dataflows
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
High-Availability Algorithms for Distributed Stream Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Consistently estimating the selectivity of conjuncts of predicates
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Declarative support for sensor data cleaning
PERVASIVE'06 Proceedings of the 4th international conference on Pervasive Computing
Sketching probabilistic data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Robust management of outliers in sensor network aggregate queries
MobiDE '07 Proceedings of the 6th ACM international workshop on Data engineering for wireless and mobile access
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Suppression and failures in sensor networks: a Bayesian approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Event queries on correlated probabilistic streams
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Cascadia: A System for Specifying, Detecting, and Managing RFID Events
Proceedings of the 6th international conference on Mobile systems, applications, and services
ACM SIGACT News
Tagmark: reliable estimations of RFID tags for business processes
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
RFID: An Ideal Technology for Ubiquitous Computing?
UIC '08 Proceedings of the 5th international conference on Ubiquitous Intelligence and Computing
Cleaning uncertain data with quality guarantees
Proceedings of the VLDB Endowment
Efficient RFID Data Imputation by Analyzing the Correlations of Monitored Objects
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Maintaining consistency of vague databases using data dependencies
Data & Knowledge Engineering
Finding misplaced items in retail by clustering RFID data
Proceedings of the 13th International Conference on Extending Database Technology
Sensor faults: Detection methods and prevalence in real-world datasets
ACM Transactions on Sensor Networks (TOSN)
Leveraging spatio-temporal redundancy for RFID data cleansing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
TACO: tunable approximate computation of outliers in wireless sensor networks
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining uncertain data with probabilistic guarantees
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Online anomaly detection for sensor systems: A simple and efficient approach
Performance Evaluation
Accelerating probabilistic frequent itemset mining: a model-based approach
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Explore or exploit?: effective strategies for disambiguating large databases
Proceedings of the VLDB Endowment
Data Auditor: exploring data quality and semantics using pattern tableaux
Proceedings of the VLDB Endowment
Leveraging communication information among readers for RFID data cleaning
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Resilience is more than availability
Proceedings of the 2011 workshop on New security paradigms workshop
Efficiently answering probability threshold-based shortest path queries over uncertain graphs
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Incremental update on probabilistic frequent itemsets in uncertain databases
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Efficient subject-oriented evaluating and mining methods for data with schema uncertainty
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
An efficient method for cleaning dirty-events over uncertain data in WSNs
Journal of Computer Science and Technology - Special issue on Natural Language Processing
Adam: Identifying defects in context-aware adaptation
Journal of Systems and Software
In-network approximate computation of outliers with quality guarantees
Information Systems
IDEA: improving dependability for self-adaptive applications
Proceedings of the 2013 Middleware Doctoral Symposium
Hi-index | 0.00 |
Mobile and pervasive applications frequently rely on devices such as RFID antennas or sensors (light, temperature, motion) to provide them information about the physical world. These devices, however, are unreliable. They produce streams of information where portions of data may be missing, duplicated, or erroneous. Current state of the art is to correct errors locally (e.g., range constraints for temperature readings) or use spatial/temporal correlations (e.g., smoothing temperature readings). However, errors are often apparent only in a global setting, e.g., missed readings of objects that are known to be present, or exit readings from a parking garage without matching entry readings.In this paper, we present StreamClean, a system for correcting input data errors automatically using application defined global integrity constraints. Because it is frequently impossible to make corrections with certainty, we propose a probabilistic approach, where the system assigns to each input tuple the probability that it is correct.We show that StreamClean handles a large class of input data errors, and corrects them sufficiently fast to keep-up with input rates of many mobile and pervasive applications. We also show that the probabilities assigned by StreamClean correspond to a user's intuitive notion of correctness.