Lexicon-free and context-free drug names identification methods using hidden markov models and pointwise mutual information

  • Authors:
  • Jacek Malyszko;Agata Filipowska

  • Affiliations:
  • Poznan University of Economics, Poznan, Poland;Poznan University of Economics, Poznan, Poland

  • Venue:
  • Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper concerns the issue of extraction of medicine names from free text documents written in Polish. Using lexicon-based approaches, it is impossible to identify unknown or misspelled medicine names. In this paper, we present the results of experimentation on two methods: Hidden Markov Model (HMM) and Pointwise Mutual Information (PMI)-based approach. The experiment was to identify the medicine names without the use of lexicon or contextual information. The experimentation results show, that HMM may be used as one of several steps in drug names' identification (with F-score slightly below 70% for the test set), while the PMI can help in increasing the precision of results achieved using HMM, but with significant loss in recall.