A Sampling-Based Approach to Information Recovery

Authors:
Junyi Xie;Jun Yang;Yuguo Chen;Haixun Wang;Philip S. Yu
Affiliations:
Oracle Corporation, Redwood City, California, USA. junyi.xie@oracle.com;Department of Computer Science, Duke University, Durham, North Carolina, USA. junyang@cs.duke.edu;Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA. yuguo@uiuc.edu;IBM T. J. Watson Research Center, Hawthorne, New York, USA. haixun@cs.duke.edu;IBM T. J. Watson Research Center, Hawthorne, New York, USA. psyu@cs.duke.edu
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 12

MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Tagmark: reliable estimations of RFID tags for business processes

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding misplaced items in retail by clustering RFID data

Proceedings of the 13th International Conference on Extending Database Technology
Leveraging spatio-temporal redundancy for RFID data cleansing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
DCUBE: CUBE on dirty databases

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Distributed inference and query processing for RFID tracking and monitoring

Proceedings of the VLDB Endowment
The optimal k-covering tag deployment for RFID-based localization

Journal of Network and Computer Applications
Querying uncertain data with aggregate constraints

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Leveraging communication information among readers for RFID data cleaning

WAIM'11 Proceedings of the 12th international conference on Web-age information management
KLEAP: an efficient cleaning method to remove cross-reads in RFID streams

Proceedings of the 20th ACM international conference on Information and knowledge management
X-CleLo: intelligent deterministic RFID data and event transformer

Personal and Ubiquitous Computing
Reasoning about RFID-tracked moving objects in symbolic indoor spaces

Proceedings of the 25th International Conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been a recent resurgence of interest in research on noisy and incomplete data. Many applications require information to be recovered from such data. Ideally, an approach for information recovery should have the following features. First, it should be able to incorporate prior knowledge about the data, even if such knowledge is in the form of complex distributions and constraints for which no close-form solutions exist. Second, it should be able to capture complex correlations and quantify the degree of uncertainty in the recovered data, and further support queries over such data. The database community has developed a number of approaches for information recovery, but none is general enough to offer all above features. To overcome the limitations, we take a significantly more general approach to information recovery based on sampling. We apply sequential importance sampling, a technique from statistics that works for complex distributions and dramatically outperforms naive sampling when data is constrained. We illustrate the generality and efficiency of this approach in two application scenarios: cleansing RFID data, and recovering information from published data that has been summarized and randomized for privacy.