COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
The nature of statistical learning theory
The nature of statistical learning theory
Selective Sampling Using the Query by Committee Algorithm
Machine Learning
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Selective Sampling with Redundant Views
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Diverse ensembles for active learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Near-duplicate detection for eRulemaking
dg.o '05 Proceedings of the 2005 national conference on Digital government research
Why inverse document frequency?
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Multidimensional text analysis for eRulemaking
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Automated classification of congressional legislation
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Near-duplicate detection by instance-level constrained clustering
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Information acquisition using multiple classifications
Proceedings of the 4th international conference on Knowledge capture
A study in rule-specific issue categorization for e-rulemaking
dg.o '08 Proceedings of the 2008 international conference on Digital government research
That is your evidence?: Classifying stance in online political debate
Decision Support Systems
Hi-index | 0.00 |
We address the e-rulemaking problem of reducing the manual labor required to analyze public comment sets. In current and previous work, for example, text categorization techniques have been used to speed up the comment analysis phase of e-rulemaking --- by classifying sentences automatically, according to the rule-specific issues [2] or general topics that they address [7, 8]. Manually annotated data, however, is still required to train the supervised inductive learning algorithms that perform the categorization. This paper, therefore, investigates the application of active learning methods for public comment categorization: we develop two new, general-purpose, active learning techniques to selectively sample from the available training data for human labeling when building the sentence-level classifiers employed in public comment categorization. Using an e-rulemaking corpus developed for our purposes [2], we compare our methods to the well-known query by committee (QBC) active learning algorithm [5] and to a baseline that randomly selects instances for labeling in each round of active learning. We show that our methods statistically significantly exceed the performance of the random selection active learner and the query by committee (QBC) variation, requiring many fewer training examples to reach the same levels of accuracy on a held-out test set. This provides promising evidence that automated text categorization methods might be used effectively to support public comment analysis.