A nearest neighborhood algebra for probabilistic databases

  • Authors:
  • Shichao Zhang

  • Affiliations:
  • (Currently on leave for one year to: Sch. of Math. and Comp. Sci., Guangxi Normal Univ., Guilin 541004, P.R. China) Sch. of Comp., Natnl. Univ. of Singapore, Lower Kent Ridge, Singapore 119260. zh ...

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Queries on probabilistic databases would be based on approximate matching rather than exact matching. This is partly due to the fact that the user may not know what are the exact probabilities of objects in a database. On the other hand, the domain of the attribute of a 1NF relational scheme is generally required finite. But the domain (0, 1] of the attribute that describes the probabilistic significance of an object is infinite. This means that it does not seem appropriate for approximate queries. In order to perform anything useful, a probabilistic data model is advocated for representing probabilistic data in this paper. The model is based on our definition of the nearest neighbor of data, which is used to measure the equality of probabilistic data. As a result, the approximation and infinite semantics of probabilistic data can be modeled in the nearest neighbor. Furthermore, a probabilistic relational algebra is also proposed so as to approximately query such databases.