Document categorization based on minimum loss of reconstruction information

Authors:
Juan Carlos Gomez;Marie-Francine Moens
Affiliations:
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Venue:
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
Year:
2012

Citing 15
Cited 0

A Sequential Factorization Method for Recovering Shape and Motion From Image Streams

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Dimension Reduction in Text Classification with Support Vector Machines

The Journal of Machine Learning Research
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Kernel PCA for novelty detection

Pattern Recognition
New Routes from Minimal Approximation Error to Principal Components

Neural Processing Letters
Object detection using image reconstruction with PCA

Image and Vision Computing
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Practical Approaches to Principal Component Analysis in the Presence of Missing Values

The Journal of Machine Learning Research
PCA document reconstruction for email classification

Computational Statistics & Data Analysis
Non-negative matrix factorization based text mining: feature extraction and classification

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present and validate a novel approach for single-label multi-class document categorization. The proposed categorization approach relies on the statistical property of Principal Component Analysis (PCA), which minimizes the reconstruction error of the training documents used to compute a low-rank category transformation matrix. This matrix allows projecting the original training documents from a given category to a new low-rank space and then optimally reconstructs them to the original space with a minimum loss of information. The proposed method, called Minimum Loss of Reconstruction Information (mLRI) classifier, uses this property, extends and applies it to unseen documents. Several experiments on three well-known multi-class datasets for text categorization are conducted in order to highlight the stable and generally better performance of the proposed approach in comparison with other popular categorization methods.