Headwords and suffixes in biomedical names

Authors:
Manabu Torii;Hongfang Liu
Affiliations:
Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC;Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC
Venue:
KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Year:
2006

Citing 6
Cited 0

Biomedical named entity recognition using two-phase model based on SVMs

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Improving the performance of dictionary-based approaches in protein name recognition

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Using name-internal and contextual features to classify biological terms

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Recognizing names in biomedical texts: a machine learning approach

Bioinformatics
Tuning support vector machines for biomedical named entity recognition

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Natural Language Processing (NLP) techniques have been used for the task of extracting and mining knowledge from biomedical literature. One of the critical steps of such a task is biomedical named entity tagging (BNER) which usually contains two steps: the first step is the identification of biomedical names in text and the second is the assignment of semantic classes predefined to names identified by the first step. Headwords and suffixes have been used frequently by BNER systems as features for the assignment of semantic classes to names in text. However, there are few studies to evaluate the performance of headwords and suffixes in predicting semantic classes of biomedical terms utilizing knowledge sources in an unsupervised way. We conducted a study to evaluate the performance of headwords and suffixes using names in the Unified Medical Language System (UMLS) where the semantic classes associated with these names were obtained by modifying an existing UMLS semantic group system and incorporating the GENIA ontology. We define headwords and suffixes that are significantly associated with a specific semantic class as semantic suffixes. The performance of semantic assignment using semantic suffixes achieved an F-measure of 86.4% with a precision of 91.6% and a recall of 81.7%. When applying these semantic suffixes obtained using the UMLS to names extracted from the GENIA corpus, the system achieved an F-measure of 73.4% with a precision of 84.2% and a recall of 65.1% where these performance measures could be improved dramatically when limited to names associated with classes that have the corresponding GENIA types.