Text Categorization of Biomedical Data Sets Using Graph Kernels and a Controlled Vocabulary

  • Authors:
  • Said Bleik;Meenakshi Mishra;Jun Huan;Min Song

  • Affiliations:
  • New Jersey Institute of Technology, Newark;University of Kansas, Lawrence;University of Kansas, Lawrence;Yonsei University, Seol

  • Venue:
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, graph representations of text have been showing improved performance over conventional bag-of-words representations in text categorization applications. In this paper, we present a graph-based representation for biomedical articles and use graph kernels to classify those articles into high-level categories. In our representation, common biomedical concepts and semantic relationships are identified with the help of an existing ontology and are used to build a rich graph structure that provides a consistent feature set and preserves additional semantic information that could improve a classifier's performance. We attempt to classify the graphs using both a set-based graph kernel that is capable of dealing with the disconnected nature of the graphs and a simple linear kernel. Finally, we report the results comparing the classification performance of the kernel classifiers to common text-based classifiers.