The nature of statistical learning theory
The nature of statistical learning theory
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Near-duplicate detection for eRulemaking
dg.o '05 Proceedings of the 2005 national conference on Digital government research
Why inverse document frequency?
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Multidimensional text analysis for eRulemaking
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Active learning for e-rulemaking: public comment categorization
dg.o '08 Proceedings of the 2008 international conference on Digital government research
Get out the vote: determining support or opposition from congressional floor-debate transcripts
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Creating and visualizing fuzzy document classification
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Hi-index | 0.01 |
For social science researchers, content analysis and classification of United States Congressional legislative activities have been time consuming and costly. The Library of Congress THOMAS system provides detailed information about bills and laws, but its classification system, the Legislative Indexing Vocabulary (LIV), is geared toward information retrieval instead of the pattern or historical trend recognition that social scientists value. The same event (a bill) may be coded with many subjects at the same time, with little indication of its primary emphasis. In addition, because the LIV system has not been applied to other activities, it cannot be used to compare (for example) legislative issue attention to executive, media, or public issue attention.This paper presents the Congressional Bills Project's (www.congressionalbills.org) automated classification system. This system applies a topic spotting classification algorithm to the task of coding legislative activities into one of 226 subtopic areas. The algorithm uses a traditional bag-of-words document representation, an extensive set of human coded examples, and an exhaustive topic coding system developed for use by the Congressional Bills Project and the Policy Agendas Project (www.policyagendas.org). Experimental results demonstrate that the automated system is about as effective as human assessors, but with significant time and cost savings. The paper concludes by discussing challenges to moving the system into operational use.