Towards realistic sampling: generating dependencies in a relational database

  • Authors:
  • Teodora sandra buda;John Murphy;Morten Kristiansen

  • Affiliations:
  • University College Dublin;University College Dublin;IBM Software Group, Dublin, Ireland

  • Venue:
  • Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Managing large amounts of information is one of the most expensive, time-consuming and non-trivial activities and it usually requires expert knowledge. In a wide range of application areas, such as data mining, histogram construction, approximate query evaluation, and software validation, handling exponentially growing databases has become a difficult challenge, and a subset of the data is generally preferred. As a solution to the current challenges in managing large amounts of data, database sampling from the operational data available has proved to be a powerful technique. However, none of the existing sampling approaches consider the dependencies between the data in a relational database. In this paper, we propose a novel approach towards constructing a realistic testing environment, by analyzing the distribution of data in the original database along these dependencies before sampling, so that the sample database is representative to the original database.