Detecting outlier sections in us congressional legislation

Authors:
Elif Aktolga;Irene Ros;Yannick Assogba
Affiliations:
University of Massachusetts Amherst, Amherst, MA, USA;IBM Watson Research Center, Cambridge, MA, USA;IBM Watson Research Center, Cambridge, MA, USA
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 20
Cited 1

Elements of information theory

Elements of information theory
Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Internet Use, Transparency, and Interactivity Effects on Trust in Government

HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 5 - Volume 5
Generating hierarchical summaries for web searches

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Framework for mining web content outliers

Proceedings of the 2004 ACM symposium on Applied computing
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
An e-government information architecture for regulation analysis and compliance assistance

ICEC '04 Proceedings of the 6th international conference on Electronic commerce
Quantifying query ambiguity

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Attribute-value specification in customs fraud detection: a human-aided approach

Proceedings of the 10th Annual International Conference on Digital Government Research: Social Networks: Making Connections between Citizens, Data and Government
DocBlocks: communication-minded visualization of topics in U.S. congressional bills

CHI '10 Extended Abstracts on Human Factors in Computing Systems
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
A new tangible user interface for machine learning document review

Artificial Intelligence and Law
Hybrid approach to web content outlier mining without query vector

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

On combining text-based and link-based similarity measures for scientific papers

Proceedings of the 2013 Research in Adaptive and Convergent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reading congressional legislation, also known as bills, is often tedious because bills tend to be long and written in complex language. In IBM Many Bills, an interactive web-based visualization of legislation, users of different backgrounds can browse bills and quickly explore parts that are of interest to them. One task users have is to be able to locate sections that don't seem to fit with the overall topic of the bill. In this paper, we present novel techniques to determine which sections within a bill are likely to be outliers by employing approaches from information retrieval. The most promising techniques first detect the most topically relevant parts of a bill by ranking its sections, followed by a comparison between these topically relevant parts and the remaining sections in the bill. To compare sections we use various dissimilarity metrics based on Kullback-Leibler Divergence. The results indicate that these techniques are more successful than a classification based approach. Finally, we analyze how the dissimilarity metrics succeed in discriminating between sections that are strong outliers versus those that are 'milder' outliers.