The new iris data: modular data generators
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Synthetic data sets can be useful for repeatable regression testing and for providing realistic — but not real — data to third parties for testing new software. In some cases, it is desirable that the synthetic data set be realistic, preserving various properties of the original data. Several synthetic data generators generate data that superficially matches known characteristics of data. This paper shows how to generate data that exhibits some of the same hidden patterns that can be discovered by data mining algorithms, in particular, decision tree patterns.