Elements of information theory
Elements of information theory
Subtopic structuring for full-length document access
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
Internet Use, Transparency, and Interactivity Effects on Trust in Government
HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 5 - Volume 5
Generating hierarchical summaries for web searches
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Framework for mining web content outliers
Proceedings of the 2004 ACM symposium on Applied computing
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
An e-government information architecture for regulation analysis and compliance assistance
ICEC '04 Proceedings of the 6th international conference on Electronic commerce
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Attribute-value specification in customs fraud detection: a human-aided approach
Proceedings of the 10th Annual International Conference on Digital Government Research: Social Networks: Making Connections between Citizens, Data and Government
DocBlocks: communication-minded visualization of topics in U.S. congressional bills
CHI '10 Extended Abstracts on Human Factors in Computing Systems
Similarity measures for short segments of text
ECIR'07 Proceedings of the 29th European conference on IR research
A new tangible user interface for machine learning document review
Artificial Intelligence and Law
Hybrid approach to web content outlier mining without query vector
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
On combining text-based and link-based similarity measures for scientific papers
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Hi-index | 0.00 |
Reading congressional legislation, also known as bills, is often tedious because bills tend to be long and written in complex language. In IBM Many Bills, an interactive web-based visualization of legislation, users of different backgrounds can browse bills and quickly explore parts that are of interest to them. One task users have is to be able to locate sections that don't seem to fit with the overall topic of the bill. In this paper, we present novel techniques to determine which sections within a bill are likely to be outliers by employing approaches from information retrieval. The most promising techniques first detect the most topically relevant parts of a bill by ranking its sections, followed by a comparison between these topically relevant parts and the remaining sections in the bill. To compare sections we use various dissimilarity metrics based on Kullback-Leibler Divergence. The results indicate that these techniques are more successful than a classification based approach. Finally, we analyze how the dissimilarity metrics succeed in discriminating between sections that are strong outliers versus those that are 'milder' outliers.