Extracting definitions from brazilian legal texts

Authors:
Edilson Ferneda;Hércules Antonio do Prado;Augusto Herrmann Batista;Marcello Sandi Pinheiro
Affiliations:
Graduate Program on Knowledge and IT Management, Catholic University of Brasilia, Brasília, DF, Brazil;Graduate Program on Knowledge and IT Management, Catholic University of Brasilia, Brasília, DF, Brazil,Embrapa - Management and Strategy Secretariat, Brasília, DF, Brazil;Graduate Program on Knowledge and IT Management, Catholic University of Brasilia, Brasília, DF, Brazil,Logistics and Information Technology Secretariat, Ministry of Planning, Budget and Manag ...;COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil,Centro de Tecnologia, Cidade Universitária, Rio de Janeiro, RJ, Brazil
Venue:
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
Year:
2012

Citing 18
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
New Directions in Question Answering

New Directions in Question Answering
Orange: from experimental machine learning to interactive data mining

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
NLTK: the Natural Language Toolkit

ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Learning to identify single-snippet answers to definition questions

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Unsupervised Multilingual Sentence Boundary Detection

Computational Linguistics
A Fully Automatic Crossword Generator

ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
Towards the automatic extraction of definitions in Slavic

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
ECODE: A Definition Extraction System

Human Language Technology. Challenges of the Information Society
An account of the challenge of tagging a reference corpus for Brazilian Portuguese

PROPOR'03 Proceedings of the 6th international conference on Computational processing of the Portuguese language
Automatic extraction of definitions in Portuguese: a rule-based approach

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Learning word-class lattices for definition and hypernym extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Description and evaluation of a definition extraction system for Spanish language

WDE '09 Proceedings of the 1st Workshop on Definition Extraction
Evolutionary algorithms for definition extraction

WDE '09 Proceedings of the 1st Workshop on Definition Extraction
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
The DIOGENE question answering system at CLEF-2004

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to avoid ambiguity and to ensure, as far as possible, a strict interpretation of law, legal texts usually define the specific lexical terms used within their discourse by means of normative rules. With an often large amount of rules in effect in a given domain, extracting these definitions manually would be a costly undertaking. This paper presents an approach to cope with this problem based in a variation of an automated technique of natural language processing of Brazilian Portuguese texts. For the sake of generality, the proposed solution was developed to address the more general problem of building a glossary from domain specific texts that contain definitions amongst their content. This solution was applied to a corpus of texts on the telecommunications regulations domain and the results are reported. The usual pipeline of natural language processing has been followed: preprocessing, segmentation, and part-of-speech tagging. A set of feature extraction functions is specified and used along with reference glossary information on whether or not a text fragment is a definition, to train a SVM classifier. At last, the definitions are extracted from the texts and evaluated upon a testing corpus, which also contains the reference glossary annotations on definitions. The results are then discussed in light of other definition extraction techniques.