Discovering multi terms and co-hyponymy from XHTML documents with XTREEM

  • Authors:
  • Marko Brunzel;Myra Spiliopoulou

  • Affiliations:
  • Otto-von-Guericke-University Magdeburg;Otto-von-Guericke-University Magdeburg

  • Venue:
  • KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Semantic Web needs ontologies as an integral component. Current methods for learning and enhancing ontologies, need to be further improved to overcome the knowledge acquisition bottleneck. The identification of concepts and relations with only minimal user interaction is still a challenging objective. Current approaches performed to extract semantics often use association rules or clustering upon regular flat text. In this paper we describe an approach on extracting semantics from Web Document collections which takes advantage of the semi structured content within XHTML (an XML dialect which can be obtained from traditional HTML documents) Web Documents. The XTREEM (Xhtml TREE Mining) method uses structural information, the mark-up in Web content, as indicators of term boundaries and for co-hyponymy relations.