Disease mention recognition with specific features

Authors:
Md. Faisal Mahbub Chowdhury;Alberto Lavelli
Affiliations:
Human Language Technology Research Unit, Fondazione Bruno Kessler, Trento, Italy and University of Trento, Italy;University of Trento, Italy
Venue:
BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Year:
2010

Citing 3
Cited 5

Classifying semantic relations in bioscience texts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploring two biomedical text genres for disease recognition

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

SimSem: fast approximate string matching in relation to semantic category disambiguation

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Assessing the practical usability of an automatically annotated corpus

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Linking multiple disease-related resources through UMLS

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Alignment-HMM-based extraction of abbreviations from biomedical text

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite an increasing amount of research on biomedical named entity recognition, there has been not enough work done on disease mention recognition. Difficulty of obtaining adequate corpora is one of the key reasons which hindered this particular research. Previous studies argue that correct identification of disease mentions is the key issue for further improvement of the disease-centric knowledge extraction tasks. In this paper, we present a machine learning based approach that uses a feature set tailored for disease mention recognition and outperforms the state-of-the-art results. The paper also discusses why a feature set for the well studied gene/protein mention recognition task is not necessarily equally effective for other biomedical semantic types such as diseases.