Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining multiple classifiers for text categorization
Proceedings of the tenth international conference on Information and knowledge management
FREM: fast and robust EM clustering for large data sets
Proceedings of the eleventh international conference on Information and knowledge management
An Efficient Fuzzy C-Means Clustering Algorithm
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
DEXA '97 Proceedings of the 8th International Workshop on Database and Expert Systems Applications
The Journal of Machine Learning Research
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Similarity between words computed by spreading activation on an English dictionary
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A model of lexical attraction and repulsion
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Effective document clustering for large heterogeneous law firm collections
ICAIL '05 Proceedings of the 10th international conference on Artificial intelligence and law
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A statistical model for topic segmentation and clustering
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
Linked data classification: a feature-based approach
Proceedings of the Joint EDBT/ICDT 2013 Workshops
The significance of evaluation in AI and law: a case study re-examining ICAIL proceedings
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law
Hi-index | 0.00 |
Clustering is a useful tool for helping users navigate, summarize, and organize large quantities of textual documents available on the Internet, in news sources, and in digital libraries. A variety of clustering methods have also been applied to the legal domain, with various degrees of success. Some unique characteristics of legal content as well as the nature of the legal domain present a number of challenges. For example, legal documents are often multi-topical, contain carefully crafted, professional, domain-specific language, and possess a broad and unevenly distributed coverage of legal issues. Moreover, unlike widely accessible documents on the Internet, where search and categorization services are generally free, the legal profession is still largely a fee-for-service field that makes the quality (e.g., in terms of both recall and precision) a key differentiator of provided services. This paper introduces a classification-based recursive soft clustering algorithm with built-in topic segmentation. The algorithm leverages existing legal document metadata such as topical classifications, document citations, and click stream data from user behavior databases, into a comprehensive clustering framework. Techniques associated with the algorithm have been applied successfully to very large databases of legal documents, which include judicial opinions, statutes, regulations, administrative materials and analytical documents. Extensive evaluations were conducted to determine the efficiency and effectiveness of the proposed algorithm. Subsequent evaluations conducted by legal domain experts have demonstrated that the quality of the resulting clusters based upon this algorithm is similar to those created by domain experts.