Towards long term data quality in a large scale biometrics experiment

  • Authors:
  • Hoang Bui;Diane Wright;Clarence Helm;Rachel Witty;Patrick Flynn;Douglas Thain

  • Affiliations:
  • University of Notre, Dame;University of Notre, Dame;University of Notre, Dame;University of Notre, Dame;University of Notre, Dame;University of Notre, Dame

  • Venue:
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Quality of data plays a very important role in any scientific research. In this paper we present some of the challenges that we face in managing and maintaining data quality for a terabyte scale biometrics repository. We have developed a step by step model to capture, ingest, validate, and prepare data for biometrics research. During these processes, there are many hidden errors which can be introduced into the data. Those errors can affect the overall quality of data, and thus can skew the results of biometrics research. We discuss necessary steps we have taken to reduce and eliminate the errors. Steps such as data replication, automated data validation, and logging metadata changes are both necessary and crucial to improve the quality and reliability of our data.