Reverse active learning for optimising information extraction training production

Authors:
Dung Nguyen;Jon Patrick
Affiliations:
School of IT, University of Sydney, Sydney, NSW, Australia;School of IT, University of Sydney, Sydney, NSW, Australia
Venue:
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Year:
2012

Citing 15
Cited 0

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Data selection for support vector machine classifiers

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Active Hidden Markov Models for Information Extraction

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Online Choice of Active Learning Algorithms

The Journal of Machine Learning Research
A stopping criterion for active learning

Computer Speech and Language
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Efficiently learning the accuracy of labeling sources for selective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
An intrinsic stopping criterion for committee-based active learning

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
An analysis of active learning strategies for sequence labeling tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

When processing a noisy corpus such as clinical texts, the corpus usually contains a large number of misspelt words, abbreviations and acronyms while many ambiguous and irregular language usages can also be found in training data needed for supervised learning. These are two frequent kinds of noise that can affect the overall performance of machine learning process. The first noise is usually filtered by the proof reading process. This paper proposes an algorithm to deal with noisy training data problem, for a method we call reverse active learning to improve performance of supervised machine learning on clinical corpora. The effects of reverse active learning are shown to produce results on the i2b2 clinical corpus that are state-of-the-art of supervised learning method and offer a means of improving all processing strategies in clinical language processing.