Minimizer of the Reconstruction Error for multi-class document categorization

Authors:
Juan Carlos Gomez;Marie-Francine Moens
Affiliations:
-;-
Venue:
Expert Systems with Applications: An International Journal
Year:
2014

Citing 17
Cited 0

A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Principal component neural networks: theory and applications

Principal component neural networks: theory and applications
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
Exploiting Hierarchy in Text Categorization

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Dimension Reduction in Text Classification with Support Vector Machines

The Journal of Machine Learning Research
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Kernel PCA for novelty detection

Pattern Recognition
New Routes from Minimal Approximation Error to Principal Components

Neural Processing Letters
Object detection using image reconstruction with PCA

Image and Vision Computing
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Practical Approaches to Principal Component Analysis in the Presence of Missing Values

The Journal of Machine Learning Research
PCA document reconstruction for email classification

Computational Statistics & Data Analysis
Non-negative matrix factorization based text mining: feature extraction and classification

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Highly discriminative statistical features for email classification

Knowledge and Information Systems

Quantified Score

Hi-index	12.05

Visualization

Abstract

In the present article we introduce and validate an approach for single-label multi-class document categorization based on text content features. The introduced approach uses the statistical property of Principal Component Analysis, which minimizes the reconstruction error of the training documents used to compute a low-rank category transformation matrix. Such matrix transforms the original set of training documents from a given category to a new low-rank space and then optimally reconstructs them to the original space with a minimum reconstruction error. The proposed method, called Minimizer of the Reconstruction Error (mRE) classifier, uses this property, and extends and applies it to new unseen test documents. Several experiments on four multi-class datasets for text categorization are conducted in order to test the stable and generally better performance of the proposed approach in comparison with other popular classification methods.