The Johnson-Lindenstrauss Lemma and the sphericity of some graphs
Journal of Combinatorial Theory Series A
Applied multivariate statistical analysis
Applied multivariate statistical analysis
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Recommendation as classification: using social and content-based information in recommendation
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Making large-scale support vector machine learning practical
Advances in kernel methods
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proximal support vector machine classifiers
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Two Variations on Fisher's Linear Discriminant for Pattern Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Automatic Learning Rate Maximization in Large Adaptive Machines
Advances in Neural Information Processing Systems 5, [NIPS Conference]
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Experiments with Random Projection
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The VLDB Journal — The International Journal on Very Large Data Bases
Learning Mixtures of Gaussians
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Successive overrelaxation for support vector machines
IEEE Transactions on Neural Networks
IDR/QR: an incremental dimension reduction algorithm via QR decomposition
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Two-Stage Linear Discriminant Analysis via QR-Decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Rough set Based Ensemble Classifier forWeb Page Classification
Fundamenta Informaticae
Latent semantic analysis for text categorization using neural network
Knowledge-Based Systems
Context-Based Term Frequency Assessment for Text Classification
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Class dependent feature scaling method using naive Bayes classifier for text datamining
Pattern Recognition Letters
Using the self organizing map for clustering of text documents
Expert Systems with Applications: An International Journal
Support vector-based feature selection using Fisher's linear discriminant and Support Vector Machine
Expert Systems with Applications: An International Journal
A novel split and merge technique for hypertext classification
Transactions on rough sets XII
Speed up kernel discriminant analysis
The VLDB Journal — The International Journal on Very Large Data Bases
Expert Systems with Applications: An International Journal
Rough set Based Ensemble Classifier forWeb Page Classification
Fundamenta Informaticae
Expert Systems with Applications: An International Journal
An Ontology-Based Mining of Consumer Feedbacks Using Fuzzy Reasoning
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Hi-index | 0.01 |
Abstract.Support vector machines (SVMs) have shown superb performance for text classification tasks. They are accurate, robust, and quick to apply to test instances. Their only potential drawback is their training time and memory requirement. For n training instances held in memory, the best-known SVM implementations take time proportional to n a, where a is typically between 1.8 and 2.1. SVMs have been trained on data sets with several thousand instances, but Web directories today contain millions of instances that are valuable for mapping billions of Web pages into Yahoo!-like directories. We present SIMPL, a nearly linear-time classification algorithm that mimics the strengths of SVMs while avoiding the training bottleneck. It uses Fisher's linear discriminant, a classical tool from statistical pattern recognition, to project training instances to a carefully selected low-dimensional subspace before inducing a decision tree on the projected instances. SIMPL uses efficient sequential scans and sorts and is comparable in speed and memory scalability to widely used naive Bayes (NB) classifiers, but it beats NB accuracy decisively. It not only approaches and sometimes exceeds SVM accuracy, but also beats the running time of a popular SVM implementation by orders of magnitude. While describing SIMPL, we make a detailed experimental comparison of SVM-generated discriminants with Fisher's discriminants, and we also report on an analysis of the cache performance of a popular SVM implementation. Our analysis shows that SIMPL has the potential to be the method of choice for practitioners who want the accuracy of SVMs and the simplicity and speed of naive Bayes classifiers.