Generating Synthetic Data to Match Data Mining Patterns

  • Authors:
  • Josh Eno;Craig W. Thompson

  • Affiliations:
  • University of Arkansas;University of Arkansas

  • Venue:
  • IEEE Internet Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Synthetic data sets can be useful for repeatable regression testing and for providing realistic — but not real — data to third parties for testing new software. In some cases, it is desirable that the synthetic data set be realistic, preserving various properties of the original data. Several synthetic data generators generate data that superficially matches known characteristics of data. This paper shows how to generate data that exhibits some of the same hidden patterns that can be discovered by data mining algorithms, in particular, decision tree patterns.