A Sequential Factorization Method for Recovering Shape and Motion From Image Streams
IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Challenges of the Email Domain for Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The Journal of Machine Learning Research
Using latent semantic indexing to filter spam
Proceedings of the 2003 ACM symposium on Applied computing
"In vivo" spam filtering: a challenge problem for KDD
ACM SIGKDD Explorations Newsletter
Dimension Reduction in Text Classification with Support Vector Machines
The Journal of Machine Learning Research
Kernel PCA for novelty detection
Pattern Recognition
Learning to detect phishing emails
Proceedings of the 16th international conference on World Wide Web
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of machine learning techniques for phishing detection
Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
Multiframe Motion Segmentation with Missing Data Using PowerFactorization and GPCA
International Journal of Computer Vision
Latent dirichlet allocation in web spam filtering
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Object detection using image reconstruction with PCA
Image and Vision Computing
E-Mail Classification for Phishing Defense
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
Knowledge extraction with non-negative matrix factorization for text classification
IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Using biased discriminant analysis for email filtering
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Non-negative matrix factorization based text mining: feature extraction and classification
ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Short-Text classification based on ICA and LSA
ISNN'06 Proceedings of the Third international conference on Advnaces in Neural Networks - Volume Part II
Text classification: combining grouping, LSA and kNN vs support vector machine
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Highly discriminative statistical features for email classification
Knowledge and Information Systems
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Document categorization based on minimum loss of reconstruction information
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
Minimizer of the Reconstruction Error for multi-class document categorization
Expert Systems with Applications: An International Journal
An ExPosition of multivariate analysis with the singular value decomposition in R
Computational Statistics & Data Analysis
Rectifying the representation learned by Non-negative Matrix Factorization
International Journal of Knowledge-based and Intelligent Engineering Systems
Hi-index | 0.03 |
This paper presents a document classifier based on text content features and its application to email classification. We test the validity of a classifier which uses Principal Component Analysis Document Reconstruction (PCADR), where the idea is that principal component analysis (PCA) can compress optimally only the kind of documents-in our experiments email classes-that are used to compute the principal components (PCs), and that for other kinds of documents the compression will not perform well using only a few components. Thus, the classifier computes separately the PCA for each document class, and when a new instance arrives to be classified, this new example is projected in each set of computed PCs corresponding to each class, and then is reconstructed using the same PCs. The reconstruction error is computed and the classifier assigns the instance to the class with the smallest error or divergence from the class representation. We test this approach in email filtering by distinguishing between two message classes (e.g. spam from ham, or phishing from ham). The experiments show that PCADR is able to obtain very good results with the different validation datasets employed, reaching a better performance than the popular Support Vector Machine classifier.