The XTREEM Methods for Ontology Learning from Web Documents

  • Authors:
  • Marko Brunzel

  • Affiliations:
  • DFKI GmbH, Kaiserslautern, Germany

  • Venue:
  • Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ontology Learning is up to now dominated by techniques which use text as input. There are only few methods which use a different data source. The techniques which use highly structured data as input have the disadvantage that such data sources are rare. On the other side, there are enormous amounts of Web content present today. We present the XTREEM (Xhtml TREE Mining) methods which enable Ontology Learning from Web Documents. Those methods rely on the semi-structure of Web Documents. The added value of Web document markup is exploited by the XTREEM methods. We show methods for the acquisition of terms, synonyms and semantic relations. The XTREEM techniques are based on the structure of Web documents; they are domain and language independent. There is no need for NLP software nor for training. They do not rely on domain or document collection specific resources or background knowledge, such as patterns, rules or other heuristics; nor do they rely on manually assembling a document collection.