Boosting the protein name recognition performance by bootstrapping on selected text

Authors:
Yue Wang;Jin-Dong Kim
Affiliations:
Database Center for Life Science, Research Organization of Information and Systems, Yayoi, Bunkyo-ku, Tokyo, Japan;Database Center for Life Science, Research Organization of Information and Systems, Yayoi, Bunkyo-ku, Tokyo, Japan
Venue:
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Year:
2012

Citing 4
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Bootstrapping and evaluating named entity recognition in the biomedical domain

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Corpus design for biomedical natural language processing

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

When only a small amount of manually annotated data is available, application of a bootstrapping method is often considered to compensate for the lack of sufficient training material for a machine-learning method. The paper reports a series of experimental results of bootstrapping for protein name recognition. The results show that the performance changes significantly according to the choice of text collection where the training samples to bootstrap, and that an improvement can be obtained only with a well chosen text collection.