Improving Data Quality in Practice: A Case Study in the Italian Public Administration

  • Authors:
  • P. Missier;G. Lalk;V. Verykios;F. Grillo;T. Lorusso;P. Angeletti

  • Affiliations:
  • Applied Research, Telcordia Technologies, Morristown, NJ, USA;Applied Research, Telcordia Technologies, Morristown, NJ, USA;College of Information Science and Technology, Drexel University, Philadelphia, PA, USA;Italian Ministry of Finances, Italy;Italian Ministry of Finances, Italy;SO.GE.I, Roma, Italy

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Assessing and improving the quality of data stored in information systems are both important and difficult tasks. For an increasing number of companies that rely on information as one of their most important assets, enforcing high data quality levels represents a strategic investment aimed at preserving the value of those assets. For a public administration or a government, good data quality translates into good service and good relationships with the citizens. Achieving high quality standards, however, is a major task because of the variety of ways that errors might be introduced in a system, and the difficulty of correcting them in a systematic way. Problems with data quality tend to fall into two categories. The first category is related to inconsistency among systems such as format, syntax and semantic inconsistencies. The second category is related to inconsistency with reality as it is exemplified by missing, obsolete and incorrect data values and outliers.In this paper, we describe a real-life case study on assessing and improving the quality of the data in the Italian Public Administration. The domain of study is set on taxpayer's data maintained by the Italian Ministry of Finances. In this context, we provide the Administration with a quantitative reckoning of such specific problems as record duplication and address mismatch and obsolescence, we suggest a set of guidelines for setting precise quality improvement goals, and we illustrate analysis techniques for achieving those goals. Our guidelines emphasize the importance of data flow analysis and of the definition of measurable quality indicators. The quality indicators that we propose are generic and can be used to describe a variety of data quality problems, thus representing a possible reference framework for practitioners. Finally, we investigate ways to partially automate the analysis of the causes for poor data quality.