Automated classification of congressional legislation

Authors:
Stephen Purpura;Dustin Hillard
Affiliations:
Harvard University;University of Washington
Venue:
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Year:
2006

Citing 9
Cited 3

The nature of statistical learning theory

The nature of statistical learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Latent Semantic Kernels

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Near-duplicate detection for eRulemaking

dg.o '05 Proceedings of the 2005 national conference on Digital government research
Why inverse document frequency?

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Multidimensional text analysis for eRulemaking

dg.o '06 Proceedings of the 2006 international conference on Digital government research

Active learning for e-rulemaking: public comment categorization

dg.o '08 Proceedings of the 2008 international conference on Digital government research
Get out the vote: determining support or opposition from congressional floor-debate transcripts

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Creating and visualizing fuzzy document classification

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics

Quantified Score

Hi-index	0.01

Visualization

Abstract

For social science researchers, content analysis and classification of United States Congressional legislative activities have been time consuming and costly. The Library of Congress THOMAS system provides detailed information about bills and laws, but its classification system, the Legislative Indexing Vocabulary (LIV), is geared toward information retrieval instead of the pattern or historical trend recognition that social scientists value. The same event (a bill) may be coded with many subjects at the same time, with little indication of its primary emphasis. In addition, because the LIV system has not been applied to other activities, it cannot be used to compare (for example) legislative issue attention to executive, media, or public issue attention.This paper presents the Congressional Bills Project's (www.congressionalbills.org) automated classification system. This system applies a topic spotting classification algorithm to the task of coding legislative activities into one of 226 subtopic areas. The algorithm uses a traditional bag-of-words document representation, an extensive set of human coded examples, and an exhaustive topic coding system developed for use by the Congressional Bills Project and the Policy Agendas Project (www.policyagendas.org). Experimental results demonstrate that the automated system is about as effective as human assessors, but with significant time and cost savings. The paper concludes by discussing challenges to moving the system into operational use.