A system for classifying multi-label text into EuroVoc

  • Authors:
  • Guido Boella;Luigi Di Caro;Daniele Rispoli;Livio Robaldo

  • Affiliations:
  • Universita' di Torino;Universita' di Torino;Universita' di Torino;Universita' di Torino

  • Venue:
  • Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work we present a working system for automatic classification of text documents into the EuroVoc multilingual thesaurus. EuroVoc contains around 7,000 categories with different levels of specificity. The system relies on a simple approach for the treatment of multi-label texts where each document may have more than one associated category. The classifier is based on the well-known Support Vector Machine algorithm trained using the JRC-Acquis corpus, containing around 23,000 documents labeled with six EuroVoc categories in average. The demonstration scenario will show the ability of the system to classify documents taken on site from the Eur-Lex web portal of the European Union, together with features for visualization and navigation of the texts at different granulatity.