Segmentation of legal documents

Authors:
Eneldo Loza Mencía
Affiliations:
TU Darmstadt, Germany
Venue:
Proceedings of the 12th International Conference on Artificial Intelligence and Law
Year:
2009

Citing 12
Cited 1

A hierarchical approach to wrapper induction

Proceedings of the third annual conference on Autonomous Agents
Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Hierarchical Wrapper Induction for Semistructured Information Sources

Autonomous Agents and Multi-Agent Systems
Evaluating machine learning for information extraction

ICML '05 Proceedings of the 22nd international conference on Machine learning
Adaptive information extraction

ACM Computing Surveys (CSUR)
Hierarchical, perceptron-like learning for ontology-based information extraction

Proceedings of the 16th international conference on World Wide Web
Comparisons of sequence labeling algorithms and extensions

Proceedings of the 24th international conference on Machine learning
Relational learning via propositional algorithms: an information extraction case study

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Using uneven margins SVM and perceptron for information extraction

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
SVM based learning system for information extraction

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning

A corpus of Australian contract language: description, profiling and analysis

Proceedings of the 13th International Conference on Artificial Intelligence and Law

Quantified Score

Hi-index	0.00

Visualization

Abstract

An overwhelming number of legal documents is available in digital form. However, most of the texts are usually only provided in a semi-structured form, i.e. the documents are structured only implicitly using text formatting and alignment. In this form the documents are perfectly understandable by a human, but not by a machine. This is an obstacle towards advanced intelligent legal information retrieval and knowledge systems. The reason for this lack of structured knowledge is that the conversion of texts in conventional form into a structured, machine-readable form, a process called segmentation, is frequently done manually and is therefore very expensive. We introduce a trainable system based on state-of-the-art Information Extraction techniques for the automatic segmentation of legal documents. Our system makes special use of the implicitly given structure in the source digital file as well as of the explicit knowledge about the target structure. Our evaluation on the French IPR Law demonstrates that the system is able to learn an effective segmenter given only a few manually processed training documents. In some cases, even only one seen example is sufficient in order to correctly process the remaining documents.