Data warehousing: data cleaning and loading

  • Authors:
  • Toby Bloom

  • Affiliations:
  • Chief Technology Officer, Clinsoft Corporation, Lexington, Massachusetts

  • Venue:
  • Handbook of data mining and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The process of preparing data for mining includes extracting the data from multiple sources, cleaning it, transforming it to a common format, and finally, writing the data to the target warehouse(s) or file(s). Because data mining applications look for patterns and correlations that were not previously predicted, these applications are particularly susceptible to spurious results if the input data are bad. Important patterns may be missed, and nonexistent correlations detected. The data must be cleaned and transformed into a structure usable by the data mining application. The process involves moving and processing very large amounts of data on a regular basis, and significant challenges exist in performing these steps efficiently. This article describes the kinds of corrections that must be made and techniques for managing and optimizing the transformation process.