Error Correcting Codes with Optimized Kullback-Leibler Distances for Text Categorization

Authors:
Jörg Kindermann;Gerhard Paass;Edda Leopold
Affiliations:
-;-;-
Venue:
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2001

Citing 9
Cited 4

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the Learnability and Design of Output Codes for Multiclass Problems

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Authorship Attribution with Support Vector Machines

Applied Intelligence
A New Multi-Class SVM Based on a Uniform Convergence Result

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 4 - Volume 4
Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research

The indiGo Project: Enhancement of Experience Management and Process Learning with Moderated Discourses

Industrial Conference on Data Mining: Advances in Data Mining, Applications in E-Commerce, Medicine, and Knowledge Management
An automatic diagnosis system based on thyroid gland: ADSTG

Expert Systems with Applications: An International Journal
An expert system based on Generalized Discriminant Analysis and Wavelet Support Vector Machine for diagnosis of thyroid diseases

Expert Systems with Applications: An International Journal
A New Expert System for Diagnosis of Lung Cancer: GDA--LS_SVM

Journal of Medical Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

We extend a multi-class categorization scheme proposed by Dietterich and Bakiri 1995 for binary classifiers, using error correcting codes. The extension comprises the computation of the codes by a simulated annealing algorithm and optimization of Kullback-Leibler (KL) category distances within the code-words. For the first time, we apply the scheme to text categorization with support vector machines (SVMs) on several large text corpora with more than 100 categories. The results are compared to 1-of-N coding (i.e. one SVM for each text category). We also investigate codes with optimized KL distance between the text categories which are merged in the code-words. We find that error correcting codes perform better than 1-of-N coding with increasing code length. For very long codes, the performance is in some cases further improved by KL-distance optimization.