Medical entity recognition: a comparison of semantic and statistical methods

Authors:
Asma Ben Abacha;Pierre Zweigenbaum
Affiliations:
LIMSI-CNRS, Orsay Cedex, France;LIMSI-CNRS, Orsay Cedex, France
Venue:
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Year:
2011

Citing 7
Cited 3

Two biomedical sublanguages: a description based on the theories of Zellig Harris

Journal of Biomedical Informatics - Special issue: Sublanguage
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Classifying semantic relations in bioscience texts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Rule-Based Protein Term Identification with Help from Automatic Species Tagging

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Comparing and combining chunkers of biomedical text

Journal of Biomedical Informatics
Empirical textual mining to protein entities recognition from pubmed corpus

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Medical question answering: translating medical questions into sparql queries

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
A statistical medical summary translation system

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Medical Entity Recognition is a crucial step towards efficient medical texts analysis. In this paper we present and compare three methods based on domain-knowledge and machine-learning techniques. We study two research directions through these approaches: (i) a first direction where noun phrases are extracted in a first step with a chunker before the final classification step and (ii) a second direction where machine learning techniques are used to identify simultaneously entities boundaries and categories. Each of the presented approaches is tested on a standard corpus of clinical texts. The obtained results show that the hybrid approach based on both machine learning and domain knowledge obtains the best performance.