iDocument: using ontologies for extracting and annotating information from unstructured text

  • Authors:
  • Benjamin Adrian;Jörn Hees;Ludger Van Elst;Andreas Dengel

  • Affiliations:
  • Knowledge Management Department, DFKI, Kaiserslautern, Germany;CS Department, University of Kaiserslautern, Kaiserslautern, Germany;Knowledge Management Department, DFKI, Kaiserslautern, Germany;Knowledge Management Department, DFKI, Kaiserslautern, Germany and CS Department, University of Kaiserslautern, Kaiserslautern, Germany

  • Venue:
  • KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontologybased information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relevant information from un-annotated and unstructured text. iDocument provides a pipeline architecture, an extraction template interface and the ability of exchanging domain ontologies for performing information extraction tasks. This work outlines iDocument's ontology-based architecture, the use of SPARQL queries as extraction templates and an evaluation of iDocument in an automatic document annotation scenario.