A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Principal component neural networks: theory and applications
Principal component neural networks: theory and applications
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
High-performing feature selection for text classification
Proceedings of the eleventh international conference on Information and knowledge management
Exploiting Hierarchy in Text Categorization
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
The Journal of Machine Learning Research
Dimension Reduction in Text Classification with Support Vector Machines
The Journal of Machine Learning Research
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Kernel PCA for novelty detection
Pattern Recognition
New Routes from Minimal Approximation Error to Principal Components
Neural Processing Letters
Object detection using image reconstruction with PCA
Image and Vision Computing
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Practical Approaches to Principal Component Analysis in the Presence of Missing Values
The Journal of Machine Learning Research
PCA document reconstruction for email classification
Computational Statistics & Data Analysis
Non-negative matrix factorization based text mining: feature extraction and classification
ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Highly discriminative statistical features for email classification
Knowledge and Information Systems
Hi-index | 12.05 |
In the present article we introduce and validate an approach for single-label multi-class document categorization based on text content features. The introduced approach uses the statistical property of Principal Component Analysis, which minimizes the reconstruction error of the training documents used to compute a low-rank category transformation matrix. Such matrix transforms the original set of training documents from a given category to a new low-rank space and then optimally reconstructs them to the original space with a minimum reconstruction error. The proposed method, called Minimizer of the Reconstruction Error (mRE) classifier, uses this property, and extends and applies it to new unseen test documents. Several experiments on four multi-class datasets for text categorization are conducted in order to test the stable and generally better performance of the proposed approach in comparison with other popular classification methods.