Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
The Perceptron Algorithm with Uneven Margins
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The Journal of Machine Learning Research
A family of additive online algorithms for category ranking
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Noise Tolerant Variants of the Perceptron Algorithm
The Journal of Machine Learning Research
Multilabel classification via calibrated label ranking
Machine Learning
Efficient Pairwise Classification
ECML '07 Proceedings of the 18th European conference on Machine Learning
Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
A Unified Model for Multilabel Classification and Ranking
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Rule stacking: an approach for compressing an ensemble of rule sets into a single classifier
DS'11 Proceedings of the 14th international conference on Discovery science
Scalable text classification with sparse generative modeling
PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Integrated instance- and class-based generative modeling for text classification
Proceedings of the 18th Australasian Document Computing Symposium
Hi-index | 0.00 |
In this paper we apply multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union. For this document collection, we studied three different multilabel classification problems, the largest being the categorization into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach which independently trains one classifier per label; (ii) the multiclass multilabel perceptron algorithm, which respects dependencies between the base classifiers; and (iii) the multilabel pairwise perceptron algorithm, which trains one classifier for each pair of labels. All algorithms use the simple but very efficient perceptron algorithm as the underlying classifier, which makes them very suitable for large-scale multilabel classification problems. The main challenge we had to face was that the almost 8,000,000 perceptrons that had to be trained in the pairwise setting could no longer be stored in memory. We solve this problem by resorting to the dual representation of the perceptron, which makes the pairwise approach feasible for problems of this size. The results on the EUR-Lex database confirm the good predictive performance of the pairwise approach and demonstrates the feasibility of this approach for large-scale tasks.