Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Collection statistics for fast duplicate document detection
ACM Transactions on Information Systems (TOIS)
On Clustering Validation Techniques
Journal of Intelligent Information Systems
Duplicate Removal in Information System Dissemination
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Finding Near-Replicas of Documents and Servers on the Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online duplicate document detection: signature reliability in a dynamic retrieval environment
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Constructing a text corpus for inexact duplicate detection
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Language processing technologies for electronic rulemaking: a project highlight
dg.o '05 Proceedings of the 2005 national conference on Digital government research
Multidimensional text analysis for eRulemaking
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Automatically labeling hierarchical clusters
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Automated classification of congressional legislation
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Next steps in near-duplicate detection for eRulemaking
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Progress in language processing technology for electronic rulemaking
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Near-duplicate detection by instance-level constrained clustering
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Identifying and classifying subjective claims
dg.o '07 Proceedings of the 8th annual international conference on Digital government research: bridging disciplines & domains
A bootstrapping approach for identifying stakeholders in public-comment corpora
dg.o '07 Proceedings of the 8th annual international conference on Digital government research: bridging disciplines & domains
Active learning for e-rulemaking: public comment categorization
dg.o '08 Proceedings of the 2008 international conference on Digital government research
A study in rule-specific issue categorization for e-rulemaking
dg.o '08 Proceedings of the 2008 international conference on Digital government research
Disambiguating authors in academic publications using random forests
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Get out the vote: determining support or opposition from congressional floor-debate transcripts
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Fixing the threshold for effective detection of near duplicate web documents in web crawling
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Reuse in the wild: an empirical and ethnographic study of organizational content reuse
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Detecting near-duplicate documents using sentence-level features and supervised learning
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
U.S. regulatory agencies are required to solicit, consider, and respond to public comments before issuing regulations. In recent years, agencies have begun to accept comments via both email and Web forms. The transition from paper to electronic comments makes it much easier for individuals to customize "form" letters, which they do, creating "near-duplicate" comments that express the same viewpoint in slightly different languages. This paper explores the use of simple text clustering and retrieval algorithms for identifying near-duplicate public comments. Experiments with public comments about a recent regulation proposed by the Environmental Protection Agency (EPA) demonstrate the effectiveness of the algorithms.