Active learning technique for biomedical named entity extraction

Authors:
Sriparna Saha;Asif Ekbal;Mridula Verma;Utpal Sikdar;Massimo Poesio
Affiliations:
IIT Patna Patna, India;IIT Patna Patna, India;NIT Patna Patna, India;IIT Patna Patna, India;Center for Mind/Brain Sciences Universita di Trento Trento, Italy
Venue:
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Year:
2012

Citing 10
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Boosting precision and recall of dictionary-based protein name recognition

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
The GENIA corpus: an annotated research abstract corpus in molecular biology domain

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Introduction to the bio-entity recognition task at JNLPBA

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploiting context for biomedical entity recognition: from syntax to the web

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploring deep knowledge resources in biomedical name recognition

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Feature selection techniques for maximum entropy based biomedical named entity recognition

Journal of Biomedical Informatics
Two-phase biomedical named entity recognition using a hybrid method

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

One difficulty with machine learning for information extraction is the high cost of collecting labeled examples. Active Learning can make more efficient use of the learner's time by asking them to label only instances that are most useful for the trainer. In random sampling approach, unlabeled data is selected for annotation at random and thus can't yield the desired results. In contrast, active learning selects the useful data from a huge pool of unlabeled data for the classifier. The strategies used often classify the corpus tokens (or, data points) under wrong classes. The classifier is confused between two categories if the token is located near the margin. We develop a method for solving this problem and show that it favorably results in the increased performance. Our approach is based on the supervised machine learner, Conditional Random Field (CRF). The proposed approach is applied for solving the problem of named entity extraction from biomedical domain. Results show that proposed active learning based technique indeed improves the performance of the system.