Application of clustering to estimate missing data and improve data integrity

  • Authors:
  • R. C. T. Lee;J. R. Slagle;C. T. Mong

  • Affiliations:
  • -;-;-

  • Venue:
  • ICSE '76 Proceedings of the 2nd international conference on Software engineering
  • Year:
  • 1976

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two problems in the use of computerized data base systems are: (1) How can we estimate the values for missing data? (2) How can we improve data integrity, that is, reduce the number of errors in the data? The tool that we introduce to attack these problems is clustering analysis. Experimental results indicate that our method is feasible. Our algorithm detected an error in the book “Weyer's Warships of the World 1969.” Each of the approximately 2000 warships listed in the book has 18 variables associated with it. It would be difficult for a person to find errors in the book. Our methods do not require any a priori knowledge about the data, for example, about warships.