A Method for Automatic Discovery of Reference Data

  • Authors:
  • Lukasz Ciszak

  • Affiliations:
  • Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland 00-665

  • Venue:
  • IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The data quality assessment process consists of several phases; the first phase is the data profiling step. The result of this step is the set of the most current metadata describing the examined data set. We present here a method for automatic discovery of reference data for textual attributes. Our method combines the textual similarity approach with the characteristics of attribute value distribution. The method can discover the correct reference data values also in situations where there is a large number of data impurities. The results of the experiments performed on real address data prove that the method can effectively discover the current reference data.