A deferred cleansing method for RFID data analytics

  • Authors:
  • Jun Rao;Sangeeta Doraiswamy;Hetal Thakkar;Latha S. Colby

  • Affiliations:
  • IBM Almaden Research Center;IBM Almaden Research Center;UCLA;IBM Almaden Research Center

  • Venue:
  • VLDB '06 Proceedings of the 32nd international conference on Very large data bases
  • Year:
  • 2006

Quantified Score

Hi-index 0.02

Visualization

Abstract

Radio Frequency Identification is gaining broader adoption in many areas. One of the challenges in implementing an RFID-based system is dealing with anomalies in RFID reads. A small number of anomalies can translate into large errors in analytical results. Conventional "eager" approaches cleanse all data upfront and then apply queries on cleaned data. However, this approach is not feasible when several applications define anomalies and corrections on the same data set differently and not all anomalies can be defined beforehand. This necessitates anomaly handling at query time. We introduce a deferred approach for detecting and correcting RFID data anomalies. Each application specifies the detection and the correction of relevant anomalies using declarative sequence-based rules. An application query is then automatically rewritten based on the cleansing rules that the application has specified, to provide answers over cleaned data. We show that a naive approach to deferred cleansing that applies rules without leveraging query information can be prohibitive. We develop two novel rewrite methods, both of which reduce the amount of data to be cleaned, by exploiting predicates in application queries while guaranteeing correct answers. We leverage standardized SQL/OLAP functionality to implement rules specified in a declarative sequence-based language. This allows efficient evaluation of cleansing rules using existing query processing capabilities of a DBMS. Our experimental results show that deferred cleansing is affordable for typical analytic queries over RFID data.