Generating Synthetic Data to Match Data Mining Patterns

Authors:
Josh Eno;Craig W. Thompson
Affiliations:
University of Arkansas;University of Arkansas
Venue:
IEEE Internet Computing
Year:
2008

Citing 0
Cited 1

The new iris data: modular data generators

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synthetic data sets can be useful for repeatable regression testing and for providing realistic — but not real — data to third parties for testing new software. In some cases, it is desirable that the synthetic data set be realistic, preserving various properties of the original data. Several synthetic data generators generate data that superficially matches known characteristics of data. This paper shows how to generate data that exhibits some of the same hidden patterns that can be discovered by data mining algorithms, in particular, decision tree patterns.