The power of sampling in knowledge discovery

  • Authors:
  • Jyrki Kivinen;Heikki Mannila

  • Affiliations:
  • University of California, Santa Cruz;University of Helsinki

  • Venue:
  • PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of approximately verifying the truth of sentences of tuple relational calculus in a given relation M by considering only a random sample of M. We define two different measures for the error of a universal sentence in a relation. For a set of n universal sentences each with at most k universal quantifiers, we give upper and lower bounds for the sample sizes required for having a high probability that all the sentences with error at least &egr; can be detected as false by considering the sample. The sample sizes are O((log n)/&egr;) or O((|M|1–1/k)log n/&egr;), depending on the error measure used. We also consider universal-existential sentences.