Unsupervised word sense disambiguation in biomedical texts with co-occurrence network and graph kernel

  • Authors:
  • Tae-Gil Noh;Seong-Bae Park;Sang-Jo Lee

  • Affiliations:
  • Kyungpook National University, Daegu, South Korea;Kyungpook National University, Daegu, South Korea;Kyungpook National University, Daegu, South Korea

  • Venue:
  • DTMBIO '10 Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an unsupervised word sense disambiguation method for the biomedical domain. In this paper, a network representation of co-occurrence data is first defined to represent both word senses and word contexts. The representation expresses textual context observed around a certain term as a network, where nodes are terms and edges are the number of co-occurrences between connected terms. A graph kernel is adopted as a similarity measure between terms and senses represented in networks. Candidate senses and ambiguous contexts are then compared directly in the representation space to resolve the word sense. It only needs the sense definitions and a large amount of unlabeled texts. The experiments in the biomedical domain show that the method outperforms a baseline method of vector representation. The performance of the proposed method is comparable to the state-of-the-art unsupervised word sense disambiguation methods.