Automatic text representation, classification and labeling in European law

  • Authors:
  • Erich Schweighofer;Andreas Rauber;Michael Dittenbach

  • Affiliations:
  • Institute of Public International Law, University of Vienna Research Center for Computers and Law, Universitätsstr. 2, A-1090 Vienna, Austria;Institute for Software Technology, Vienna University of Technology, Favoritenstr. 9-11/188, A-1040 Vienna, Austria;Institute for Software Technology, Vienna University of Technology, Favoritenstr. 9-11/188, A-1040 Vienna, Austria

  • Venue:
  • Proceedings of the 8th international conference on Artificial intelligence and law
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The huge text archives and retrieval systems of legal information have not achieved yet the representation in the well-known subject-oriented structure of legal commentaries. Content-based classification and text analysis remains a high priority research topic. In the joint KONTERM, SOM and LabelSOM projects, learning techniques of neural networks are used to achieve similar high compression rates of classification and analysis like in manual legal indexing. The produced maps of legal text corpora cluster related documents in units that are described with automatically selected descriptors. Extensive tests with text corpora in European case law have shown the feasibility of this approach. Classification and labeling proved very helpful for legal research. The Growing Hierarchical Self-Organizing Map represents very interesting generalities and specialties of legal text corpora. The segmentation into document parts improved very much the quality of labeling. The next challenge would be a change from tf × idf vector representation to a modified vector representation taking into account thesauri or ontologies considering learned properties of legal text corpora.