Is linguistic information relevant for the classification of legal texts?

  • Authors:
  • Teresa Gonçalves;Paulo Quaresma

  • Affiliations:
  • Universidade de Évora, Évora, Portugal;Universidade de Évora, Évora, Portugal

  • Venue:
  • ICAIL '05 Proceedings of the 10th international conference on Artificial intelligence and law
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Text classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts.Support Vector Machines (SVM), a machine learning algorithm, has shown to be a good classifier for text bases [12]. In this paper, SVMs are applied to the classification of European Portuguese legal texts - the Portuguese Attorney General's Office Decisions - and the relevance of linguistic information in this domain, namely lemmatisation and part-of-speech tags, is evaluated.The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm.