A model of uncertainty for near-duplicates in document reference networks

  • Authors:
  • Claudia Hess;Michel De Rougemont

  • Affiliations:
  • Laboratory for Semantic Information Technology, Bamberg University;LRI, Universit Paris-Sud

  • Venue:
  • ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a model of uncertainty where documents are not uniquely identified in a reference network, and some links may be incorrect. It generalizes the probabilistic approach on databases to graphs, and defines subgraphs with a probability distribution. The answer to a relational query is a distribution of documents, and we study how to approximate the ranking of the most likely documents and quantify the quality of the approximation. The answer to a function query is a distribution of values and we consider the size of the interval of Minimum and Maximum values as a measure for the precision of the answer.