Representing Data Quality for Streaming and Static Data

  • Authors:
  • Anja Klein;Hong-Hai Do;Gregor Hackenbroich;Marcel Karnstedt;Wolfgang Lehner

  • Affiliations:
  • SAP Research CEC Dresden, SAP AG, Germany. anja.klein@sap.com;SAP Research CEC Dresden, SAP AG, Germany. hong-hai.do@sap.com;SAP Research CEC Dresden, SAP AG, Germany. gregor.hackenbroich@sap.com;Department of Computer Science and Automation, TU Ilmenau, Germany. marcel.karnstedt@tu-ilmenau.de;Database Technology Group, TU Dresden, Germany. wolfgang.lehner@tu-dresden.de

  • Venue:
  • ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In smart item environments, multitude of sensors are applied to capture data about product conditions and usage to guide business decisions as well as production automation processes. A big issue in this application area is posed by the restricted quality of sensor data due to limited sensor precision as well as sensor failures and malfunctions. Decisions derived on incorrect or misleading sensor data are likely to be faulty. The issue of how to efficiently provide applications with information about data quality (DQ) is still an open research problem. In this paper, we present a flexible model for the efficient transfer and management of data quality for streaming as well as static data. We propose a data stream metamodel to allow for the propagation of data quality from the sensors up to the respective business application without a significant overhead of data. Furthermore, we present the extension of the traditional RDBMS metamodel to permit the persistent storage of data quality information in a relational database. Finally, we demonstrate a data quality metadata mapping to close the gap between the streaming environment and the target database. Our solution maintains a flexible number of DQ dimensions and supports applications directly consuming streaming data or processing data filed in a persistent database.