A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A maximum entropy approach to natural language processing
Computational Linguistics
Selective Sampling Using the Query by Committee Algorithm
Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Minimizing manual annotation cost in supervised training from corpora
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Sample Selection for Statistical Parsing
Computational Linguistics
The class imbalance problem: A systematic study
Intelligent Data Analysis
Learning on the border: active learning in imbalanced data classification
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
On proper unit selection in active learning: co-selection effects for named entity recognition
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Active learning with statistical models
Journal of Artificial Intelligence Research
Active learning for part-of-speech tagging: accelerating corpus annotation
LAW '07 Proceedings of the Linguistic Annotation Workshop
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Bringing active learning to life
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Inactive learning?: difficulties employing active learning in practice
ACM SIGKDD Explorations Newsletter
Evaluating the impact of coder errors on active learning
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Generating balanced classifier-independent training samples from unlabeled data
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Hi-index | 0.00 |
In lots of natural language processing tasks, the classes to be dealt with often occur heavily imbalanced in the underlying data set and classifiers trained on such skewed data tend to exhibit poor performance for low-frequency classes. We introduce and compare different approaches to reduce class imbalance by design within the context of active learning (AL). Our goal is to compile more balanced data sets up front during annotation time when AL is used as a strategy to acquire training material. We situate our approach in the context of named entity recognition. Our experiments reveal that we can indeed reduce class imbalance and increase the performance of classifiers on minority classes while preserving a good overall performance in terms of macro F-score.