ProTDB: probabilistic data in XML

  • Authors:
  • Andrew Nierman;H. V. Jagadish

  • Affiliations:
  • Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI;Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI

  • Venue:
  • VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Where as traditional databases manage only deterministic information, many applications that use databases involve uncertain data. This paper presents a Probabilistic Tree Data Base (ProTDB) to manage probabilistic data, represented in XML. Our approach differs from previous efforts to develop probabilistic relational systems in that we build a probabilistic XML database. This design is driven by application needs that involve data not readily amenable to a relational representation. XML data poses several modeling challenges: due to its structure, due to the possibility of uncertainty association at multiple granularities, and due to the possibility of missing and repeated sub-elements. We present a probabilistic XML model that addresses all of these challenges. We devise an implementation of XML query operations using our probability model, and demonstrate the efficiency of our implementation experimentally. We have used ProTDB to manage data from two application areas: protein chemistry data from the bioinformatics domain, and information extraction data obtained from the web using a natural language analysis system. We present a brief case study of the latter to demonstrate the value of probabilistic XML data management.