Declarative generation of synthetic XML data: Research Articles

  • Authors:
  • Denilson Barbosa;Alberto O. Mendelzon

  • Affiliations:
  • Department of Computer Science, University of Calgary, Calgary, AB, Canada;Department of Computer Science, University of Toronto, Toronto, ON, Canada

  • Venue:
  • Software—Practice & Experience
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Synthetic data can be extremely useful in testing and evaluating algorithms, tools and systems. Most synthetic data generators available today are the result of individual benchmarking efforts. Typically, these are complex programs in which the specifications of both the structure and the contents of the data are hard-coded. As a result, it is often difficult to customize these tools for producing synthetic data tailored for specific needs. In this article, we describe the ToXgene synthetic data generator, which is a declarative tool for generating realistic XML data for benchmarking as well as testing purposes. We present our template specification language, which consists of augmenting XML Schema with probabilistic models that guide the data-generation process. We discuss the architecture of our current implementation and we argue about ToXgene's usefulness by discussing experimental results as well as describing two projects that use our tool. Copyright © 2006 John Wiley & Sons, Ltd.