A system for classifying multi-label text into EuroVoc

Authors:
Guido Boella;Luigi Di Caro;Daniele Rispoli;Livio Robaldo
Affiliations:
Universita' di Torino;Universita' di Torino;Universita' di Torino;Universita' di Torino
Venue:
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law
Year:
2013

Citing 7
Cited 0

Support-Vector Networks

Machine Learning
A vector space model for automatic indexing

Communications of the ACM
Automatic semantics extraction in law documents

ICAIL '05 Proceedings of the 10th international conference on Artificial intelligence and law
Five Guidelines for Normative Multiagent Systems

Proceedings of the 2009 conference on Legal Knowledge and Information Systems: JURIX 2009: The Twenty-Second Annual Conference
Machine Learning versus Knowledge Based Classification of Legal Texts

Proceedings of the 2010 conference on Legal Knowledge and Information Systems: JURIX 2010: The Twenty-Third Annual Conference
Protein classification with multiple algorithms

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Eunomos, a legal document and knowledge management system to build legal services

AICOL'11 Proceedings of the 25th IVR Congress conference on AI Approaches to the Complexity of Legal Systems: models and ethical challenges for legal systems, legal language and legal ontologies, argumentation and software agents

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we present a working system for automatic classification of text documents into the EuroVoc multilingual thesaurus. EuroVoc contains around 7,000 categories with different levels of specificity. The system relies on a simple approach for the treatment of multi-label texts where each document may have more than one associated category. The classifier is based on the well-known Support Vector Machine algorithm trained using the JRC-Acquis corpus, containing around 23,000 documents labeled with six EuroVoc categories in average. The demonstration scenario will show the ability of the system to classify documents taken on site from the Eur-Lex web portal of the European Union, together with features for visualization and navigation of the texts at different granulatity.