A study in rule-specific issue categorization for e-rulemaking

Authors:
Claire Cardie;Cynthia Farina;Adil Aijaz;Matt Rawding;Stephen Purpura
Affiliations:
Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY
Venue:
dg.o '08 Proceedings of the 2008 international conference on Digital government research
Year:
2008

Citing 10
Cited 2

The nature of statistical learning theory

The nature of statistical learning theory
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Improving Short-Text Classification using Unlabeled Data for Classification Problems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Near-duplicate detection for eRulemaking

dg.o '05 Proceedings of the 2005 national conference on Digital government research
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Multidimensional text analysis for eRulemaking

dg.o '06 Proceedings of the 2006 international conference on Digital government research
Near-duplicate detection by instance-level constrained clustering

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Information acquisition using multiple classifications

Proceedings of the 4th international conference on Knowledge capture

Active learning for e-rulemaking: public comment categorization

dg.o '08 Proceedings of the 2008 international conference on Digital government research
Recognizing arguing subjectivity and argument tags

ExProM '12 Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the e-rulemaking problem of categorizing public comments according to the issues that they address. In contrast to previous text categorization research in e-rulemaking [5, 6], and in an attempt to more closely duplicate the comment analysis process in federal agencies, we employ a set of rule-specific categories, each of which corresponds to a significant issue raised in the comments. We describe the creation of a corpus to support this text categorization task and report interannotator agreement results for a group of six annotators. We outline those features of the task and of the e-rulemaking context that engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning problem. Finally, we investigate the application of standard and hierarchical text categorization techniques to the e-rulemaking data sets and find that automatic categorization methods show promise as a means of reducing the manual labor required to analyze large comment sets: the automatic annotation methods approach the performance of human annotators for both flat and hierarchical issue categorization.