Identifying chemical names in biomedical text: an investigation of the substring co-occurrence based approaches

  • Authors:
  • Alexander Vasserman

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA

  • Venue:
  • HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate various strategies for finding chemicals in biomedical text using substring co-occurrence information. The goal is to build a system from readily available data with minimal human involvement. Our models are trained from a dictionary of chemical names and general biomedical text. We investigated several strategies including Naïve Bayes classifiers and several types of N-gram models. We introduced a new way of interpolating N-grams that does not require tuning any parameters. We also found the task to be similar to Language Identification.