Exploiting Gene Ontology to Conceptualize Biomedical Document Collections

  • Authors:
  • Hai-Tao Zheng;Charles Borchert;Hong-Gee Kim

  • Affiliations:
  • Biomedical Knowledge Engineering Laboratory, Seoul National University, Seoul, Korea;Biomedical Knowledge Engineering Laboratory, Seoul National University, Seoul, Korea;Biomedical Knowledge Engineering Laboratory, Seoul National University, Seoul, Korea

  • Venue:
  • ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As biomedical science progresses, ontologies play an increasingly important role in easing the understanding of biomedical information. Although much research, such as Gene Ontology annotation, has been proposed to utilize ontologies to help users understand biomedical information easily, most of the research does not focus on capturing gene-related terms and their relationships within biomedical document collections. Understanding key gene-related terms as well as their semantic relationships is essential for comprehending the conceptual structure of biomedical document collections and avoiding information overload for users. To address this issue, we propose a novel approach called `GOClonto' to automatically generate ontologies for conceptualization of biomedical document collections. Based on GO (Gene Ontology), GOClonto extracts gene-related terms from biomedical text, applies latent semantic analysis to identify key gene-related terms, allocates documents based on the key gene-related terms, and utilizes GO to automatically generate a corpus-related gene ontology. The experimental results show that GOClonto is able to identify key gene-related terms. For a test biomedical document collection, GOClonto shows better performance than other clustering algorithms in terms of F-measure. Moreover, the ontology generated by GOClonto shows a significant informative conceptual structure.