A Concept-Driven Automatic Ontology Generation Approach for Conceptualization of Document Corpora

  • Authors:
  • Hai-Tao Zheng;Charles Borchert;Hong-Gee Kim

  • Affiliations:
  • -;-;-

  • Venue:
  • WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the age of increasing information availability, many techniques, such as document clustering and information visualization, have been developed to ease understanding of information for users. However, most of these methods do not help users directly understand key concepts and their semantic relationships in document corpora, which are critical for capturing their conceptual structures. Therefore, we propose a novel approach called 'Clonto' to identify the key concepts and automatically generate ontologies based on these concepts for conceptualization of document corpora. Clonto applies latent semantic analysis to identify key concepts, allocates documents based on these concepts, and utilizes WordNet to automatically generate a corpus-related ontology. The documents are linked to the ontology through the key concepts. The experimental results show that Clonto can identify key concepts with a high precision and the clustering results of Clonto outperform the STC (Suffix Tree Clustering) algorithm, the Lingo clustering algorithm, the Fuzzy Ants clustering algorithm, and clustering based on TRS (Tolerance Rough Set). Moreover, based on the same document corpus, the ontology generated by Clonto shows a significant informative conceptual structure.