Lexicon-free and context-free drug names identification methods using hidden markov models and pointwise mutual information

Authors:
Jacek Malyszko;Agata Filipowska
Affiliations:
Poznan University of Economics, Poznan, Poland;Poznan University of Economics, Poznan, Poland
Venue:
Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
Year:
2012

Citing 2
Cited 1

Extracting the names of genes and gene products with a hidden Markov model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

DTMBIO 2012: international workshop on data and text mining in biomedical informatics

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper concerns the issue of extraction of medicine names from free text documents written in Polish. Using lexicon-based approaches, it is impossible to identify unknown or misspelled medicine names. In this paper, we present the results of experimentation on two methods: Hidden Markov Model (HMM) and Pointwise Mutual Information (PMI)-based approach. The experiment was to identify the medicine names without the use of lexicon or contextual information. The experimentation results show, that HMM may be used as one of several steps in drug names' identification (with F-score slightly below 70% for the test set), while the PMI can help in increasing the precision of results achieved using HMM, but with significant loss in recall.