A probabilistic learning method for XML annotation of documents

  • Authors:
  • Boris Chidlovskii;Jérôme Fuselier

  • Affiliations:
  • Xerox Research Centre Europe, Meylan, France;Xerox Research Centre Europe, Meylan, France and Université de Savoie, Laboratoire SysCom, Domaine Universitaire, Le Bourget-du-Lac, France

  • Venue:
  • IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of semantic annotation of semi-structured documents according to a target XML schema. The task is to annotate a document in a tree-like manner where the annotation tree is an instance of a tree class defined by DTD or W3C XML Schema descriptions. In the probabilistic setting, we cope with the tree annotation problem as a generalized probabilistic context-free parsing of an observation sequence where each observation comes with a probability distribution over terminals supplied by a probabilistic classifier associated with the content of documents. We determine the most probable tree annotation by maximizing the joint probability of selecting a terminal sequence for the observation sequence and the most probable parse for the selected terminal sequence.