Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites

  • Authors:
  • Roberto Navigli;Paola Velardi

  • Affiliations:
  • -;-

  • Venue:
  • Computational Linguistics
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present a method and a tool, OntoLearn, aimed at the extraction of domain ontologies from Web sites, and more generally from documents shared among the members of virtual organizations. OntoLearn first extracts a domain terminology from available documents. Then, complex domain terms are semantically interpreted and arranged in a hierarchical fashion. Finally, a general-purpose ontology, WordNet, is trimmed and enriched with the detected domain concepts. The major novel aspect of this approach is semantic interpretation, that is, the association of a complex concept with a complex term . This involves finding the appropriate WordNet concept for each word of a terminological string and the appropriate conceptual relations that hold among the concept components. Semantic interpretation is based on a new word sense disambiguation algorithm, called structural semantic interconnections.