Multi-Criterion Active Learning in Conditional Random Fields

Authors:
Christopher T. Symons;Nagiza F. Samatova;Ramya Krishnamurthy;Byung H. Park;Tarik Umar;David Buttler;Terence Critchlow;David Hysom
Affiliations:
Oak Ridge National Laboratory, USA;Oak Ridge National Laboratory, USA;Oak Ridge National Laboratory, USA;Oak Ridge National Laboratory, USA;Oak Ridge National Laboratory, USA;Lawrence Livermore National Laboratory, USA;Lawrence Livermore National Laboratory, USA;Lawrence Livermore National Laboratory, USA
Venue:
ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
Year:
2006

Citing 0
Cited 2

Maximum Margin Active Learning for Sequence Labeling with Different Length

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Towards a SVM-struct Based Active Learning Algorithm for Least Cost Semantic Annotation

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conditional Random Fields (CRFs), which are popular supervised learning models for many Natural Language Processing (NLP) tasks, typically require a large collection of labeled data for training. In practice, however, manual annotation of text documents is quite costly. Furthermore, even large labeled training sets can have arbitrarily limited performance peaks if they are not chosen with care. This paper considers the use of multi-criterion active learning for identification of a small but sufficient set of text samples for training CRFs. Our empirical results demonstrate that our method is capable of reducing the manual annotation costs, while also limiting the retraining costs that are often associated with active learning. In addition, we show that the generalization performance of CRFs can be enhanced through judicious selection of training examples.