Legal documents categorization by compression

Authors:
Antonio Mastropaolo;Francesco Pallante;Daniele P. Radicioni
Affiliations:
Università di Aosta, Strada Cappuccini, Aosta, Italy;Università di Torino, Torino, Italy;Università di Torino, Torino, Italy
Venue:
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law
Year:
2013

Citing 19
Cited 0

Finding legally relevant passages in case opinions

Proceedings of the 6th international conference on Artificial intelligence and law
Finding factors: learning to classify case opinions under abstract fact categories

Proceedings of the 6th international conference on Artificial intelligence and law
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Toward adding knowledge to learning algorithms for indexing legal cases

ICAIL '99 Proceedings of the 7th international conference on Artificial intelligence and law
Improving the representation of legal case texts with information extraction methods

Proceedings of the 8th international conference on Artificial intelligence and law
Automatic categorization of case law

Proceedings of the 8th international conference on Artificial intelligence and law
Using Literal and Grammatical Statistics for Authorship Attribution

Problems of Information Transmission
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Text Categorization Using Compression Models

DCC '00 Proceedings of the Conference on Data Compression
A repetition based measure for verification of text collections and for text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval

IEEE Transactions on Knowledge and Data Engineering
Automatic detection of arguments in legal texts

Proceedings of the 11th international conference on Artificial intelligence and law
Using CBR to drive IR

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Automatically classifying case texts and predicting outcomes

Artificial Intelligence and Law
On compression-based text classification

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory
The similarity metric

IEEE Transactions on Information Theory
Clustering by compression

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate how to categorize text excerpts from Italian normative texts. Although text categorization is a problem of broader interest, we single out a specific issue. Namely, we are concerned with categorizing the set of subjects in which Italian Regions are allowed to produce norms: this is the so-called residual legislative power problem. It basically consists in making explicit a set of subjects that was originally defined only in a residual and negative fashion. The categorization of legal text fragments is acknowledged to be a difficult problem, featured by abstract concepts along with a variety of locutions used to denote them, by convoluted sentence structure, and by several other facets. In addition, in the present case subjects are often partially overlapped, and a training set of sufficient size (for the problem under consideration) does not exist: all these aspects make our task challenging. In this setting, classical feature-based approaches provide poor quality results, so we explored algorithms based on compression techniques. We tested three such techniques: we illustrate their main features and report the results of an experimentation where our implementation of such algorithms is compared with the output of standard machine learning algorithms. Far from having found a silver bullet, we show that compression-based techniques provide the best results for the problem at hand, and argue that these approaches can be effectively coupled with more informative and semantically grounded ones.