Automatic ontology extraction from unstructured texts

Authors:
Khurshid Ahmad;Lee Gillam
Affiliations:
Department of Computing, University of Surrey, Guildford, Surrey, UK;Department of Computing, University of Surrey, Guildford, Surrey, UK
Venue:
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
Year:
2005

Citing 10
Cited 2

Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM

EKAW '99 Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling and Management
Pragmatics of Specialist Terms: The Acquisition and Representation of Terminology

Proceedings of the Third International EAMT Workshop on Machine Translation and the Lexicon
Ontological Engineering

Ontological Engineering
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Towards a workbench for acquisition of domain knowledge from natural language

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Ontology Evolution: Not the Same as Schema Evolution

Knowledge and Information Systems
Ontology Versioning in an Ontology Management Framework

IEEE Intelligent Systems
Corpus-based thesaurus construction for image retrieval in specialist domains

ECIR'03 Proceedings of the 25th European conference on IR research
Visualizing sequences of texts using collocational networks

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition

Artificial Ontologies and Real Thoughts: Populating the Semantic Web?

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
An open architecture for ontology-enabled content management systems: a case study in managing learning objects

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Construction of the ontology of a specific domain currently relies on the intuition of a knowledge engineer, and the typical output is a thesaurus of terms, each of which is expected to denote a concept. Ontological ‘engineers’ tend to hand-craft these thesauri on an ad-hoc basis and on a relatively smallscale. Workers in the specific domain create their own special language, and one device for this creation is the repetition of select keywords for consolidating or rejecting one or more concepts. A more scalable, systematic and automatic approach to ontology construction is possible through the automatic identification of these keywords. An approach for the study and extraction of keywords is outlined where a corpus of randomly collected unstructured, i.e. not containing any kind of mark-up, texts in a specific domain is analysed with reference to the lexical preferences of the workers in the domain. An approximation about the role of frequently used single words within multiword expressions leads us to the creation of a semantic network. The network can be asserted into a terminology database or knowledge representation formalism, and the relationship between the nodes of the network helps in the visualisation of, and automatic inference over, the frequently used words denoting important concepts in the domain. We illustrate our approach with a case study using corpora from three time periods on the emergence and consolidation of nuclear physics. The text-based approach appears to be less subjective and more suitable for introspection, and is perhaps useful in ontology evolution.