Boosting performance of gene mention tagging system by hybrid methods

Authors:
Lishuang Li;Wenting Fan;Degen Huang;Yanzhong Dang;Jing Sun
Affiliations:
School of Computer Science and Technology, Dalian University of Technology, 116023 Dalian, China;School of Computer Science and Technology, Dalian University of Technology, 116023 Dalian, China;School of Computer Science and Technology, Dalian University of Technology, 116023 Dalian, China;School of Management Science and Engineering, Dalian University of Technology, 116023 Dalian, China;School of Computer Science and Technology, Dalian University of Technology, 116023 Dalian, China
Venue:
Journal of Biomedical Informatics
Year:
2012

Citing 12
Cited 1

Rutabaga by any other name: extracting biological names

Journal of Biomedical Informatics - Special issue: Sublanguage
Biomedical named entity recognition using two-phase model based on SVMs

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Enhancing performance of protein and gene name recognizers with filtering and integration strategies

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Rich features based Conditional Random Fields for biological named entities recognition

Computers in Biology and Medicine
Brief Communication: Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature

Computational Biology and Chemistry
Integrating high dimensional bi-directional parsing models for gene mention tagging

Bioinformatics
Exploiting the contextual cues for bio-entity name recognition in biomedical literature

Journal of Biomedical Informatics
Introduction to the bio-entity recognition task at JNLPBA

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Brief Communication: Two-phase biomedical named entity recognition using CRFs

Computational Biology and Chemistry
Feature selection techniques for maximum entropy based biomedical named entity recognition

Journal of Biomedical Informatics
Bayesian classification for data from the same unknown class

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

A multi-strategy approach to biological named entity recognition

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

NER (Named Entity Recognition) in biomedical literature is presently one of the internationally concerned NLP (Natural Language Processing) research questions. In order to get higher performance, a hybrid experimental framework is presented for the gene mention tagging task. Six classifiers are firstly constructed by four toolkits (CRF++, YamCha, Maximum Entropy (ME) and MALLET) with different training methods and features sets, and then combined with three different hybrid methods respectively: simple set operation method, voting method and two layer stacking method. Experiments carried out on the corpus of BioCreative II GM task show that the three hybrid methods get the F-measure of 87.40%, 87.31% and 87.70% separately without any post-processing, which are all higher than those of any single ones. Our best hybrid method (two layer stacking method) achieves an F-measure of 88.42% after post-processing, which outperforms most of the state-of-the-art systems. We also discuss the influence on the performance of the ensemble system by the number, performance and divergence of single classifiers in each hybrid method, and give the corresponding analysis why our hybrid models can improve the performance.