Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG

  • Authors:
  • Marko Brunzel;Myra Spiliopoulou

  • Affiliations:
  • DFKI GmbH - German Research Center for Artificial Intelligence, and Otto-von-Guericke Universität Magdeburg, Germany;Otto-von-Guericke Universität Magdeburg, Germany

  • Venue:
  • Journal on Data Semantics XI
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The acquisition of explicit semantics is still a research challenge. Approaches for the extraction of semantics focus mostly on learning subordination relations. The extraction of coordination relations, also called "sibling relations" is studied much less, though they are not less important in ontology engineering. We describe and evaluate the XTREEM-SG approach on finding sibling semantics from semi-structured Web documents. XTREEM-SG stands for "Xhtml TREE Mining - for Sibling Groups". It uses the XHTML-markup that is available in Web content to group together terms that are in a sibling relation to each other. Our approach has the advantage that it is domain and language independent; it does not rely on background knowledge, NLP software nor training. We evaluate XTREEM-SG towards two gold standard ontologies. We investigate how variations on input, parameters and gold standard influence the obtained results on structuring a closed vocabulary into semantic sibling groups. Earlier methods that evaluate sibling relations against a gold standard report a 14.18% F-measure on average sibling overlap. Our method improves this number into 22.93%.