Automatic ontology extraction from unstructured texts

  • Authors:
  • Khurshid Ahmad;Lee Gillam

  • Affiliations:
  • Department of Computing, University of Surrey, Guildford, Surrey, UK;Department of Computing, University of Surrey, Guildford, Surrey, UK

  • Venue:
  • OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Construction of the ontology of a specific domain currently relies on the intuition of a knowledge engineer, and the typical output is a thesaurus of terms, each of which is expected to denote a concept. Ontological ‘engineers’ tend to hand-craft these thesauri on an ad-hoc basis and on a relatively smallscale. Workers in the specific domain create their own special language, and one device for this creation is the repetition of select keywords for consolidating or rejecting one or more concepts. A more scalable, systematic and automatic approach to ontology construction is possible through the automatic identification of these keywords. An approach for the study and extraction of keywords is outlined where a corpus of randomly collected unstructured, i.e. not containing any kind of mark-up, texts in a specific domain is analysed with reference to the lexical preferences of the workers in the domain. An approximation about the role of frequently used single words within multiword expressions leads us to the creation of a semantic network. The network can be asserted into a terminology database or knowledge representation formalism, and the relationship between the nodes of the network helps in the visualisation of, and automatic inference over, the frequently used words denoting important concepts in the domain. We illustrate our approach with a case study using corpora from three time periods on the emergence and consolidation of nuclear physics. The text-based approach appears to be less subjective and more suitable for introspection, and is perhaps useful in ontology evolution.