Polishing Blemishes: Issues in Data Correction

  • Authors:
  • Choh Man Teng

  • Affiliations:
  • -

  • Venue:
  • IEEE Intelligent Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data quality is crucial to any data-analysis task, yet blemishes in data can arise from many sources. We thus must understand data imperfections and the effectiveness of various imperfection-handling techniques. The author compares three approaches: robust algorithms that tolerate some corruption; filtering, which eliminates the noisy instances from the input; and polishing, which corrects rather than removes noisy instances. The author argues that polishing has theoretical advantages over the first two approaches and can achieve better results. He also discusses how to evaluate and validate data-correction methods, identifying pitfalls in and suggestions for designing effective metrics for accurately reflecting the extent of correction.