Extracting definitions from brazilian legal texts

  • Authors:
  • Edilson Ferneda;Hércules Antonio do Prado;Augusto Herrmann Batista;Marcello Sandi Pinheiro

  • Affiliations:
  • Graduate Program on Knowledge and IT Management, Catholic University of Brasilia, Brasília, DF, Brazil;Graduate Program on Knowledge and IT Management, Catholic University of Brasilia, Brasília, DF, Brazil,Embrapa - Management and Strategy Secretariat, Brasília, DF, Brazil;Graduate Program on Knowledge and IT Management, Catholic University of Brasilia, Brasília, DF, Brazil,Logistics and Information Technology Secretariat, Ministry of Planning, Budget and Manag ...;COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil,Centro de Tecnologia, Cidade Universitária, Rio de Janeiro, RJ, Brazil

  • Venue:
  • ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to avoid ambiguity and to ensure, as far as possible, a strict interpretation of law, legal texts usually define the specific lexical terms used within their discourse by means of normative rules. With an often large amount of rules in effect in a given domain, extracting these definitions manually would be a costly undertaking. This paper presents an approach to cope with this problem based in a variation of an automated technique of natural language processing of Brazilian Portuguese texts. For the sake of generality, the proposed solution was developed to address the more general problem of building a glossary from domain specific texts that contain definitions amongst their content. This solution was applied to a corpus of texts on the telecommunications regulations domain and the results are reported. The usual pipeline of natural language processing has been followed: preprocessing, segmentation, and part-of-speech tagging. A set of feature extraction functions is specified and used along with reference glossary information on whether or not a text fragment is a definition, to train a SVM classifier. At last, the definitions are extracted from the texts and evaluated upon a testing corpus, which also contains the reference glossary annotations on definitions. The results are then discussed in light of other definition extraction techniques.