The Johnson-Lindenstrauss Lemma and the sphericity of some graphs
Journal of Combinatorial Theory Series A
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical
Advances in kernel methods
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Proximal support vector machine classifiers
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Two Variations on Fisher's Linear Discriminant for Pattern Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient multi-way text categorization via generalized discriminant analysis
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Index construction for linear categorisation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Feature selection using linear classifier weights: interaction with classification models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Developing practical automatic metadata assignment and evaluation tools for internet resources
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
IDR/QR: An Incremental Dimension Reduction Algorithm via QR Decomposition
IEEE Transactions on Knowledge and Data Engineering
Hierarchical document classification using automatically generated hierarchy
Journal of Intelligent Information Systems
An integrated system for building enterprise taxonomies
Information Retrieval
Discriminant Subspace Analysis: A Fukunaga-Koontz Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text categorization via generalized discriminant analysis
Information Processing and Management: an International Journal
Fuzzy integral to speed up support vector machines training for pattern classification
International Journal of Knowledge-based and Intelligent Engineering Systems
Hi-index | 0.00 |
Support vector machines (SVMs) have shown superb performance for text classification tasks. They are accurate, robust, and quick to apply to test instances. Their only potential drawback is their training time and memory requirement. For n training instances held in memory, the best-known SVM implementations take time proportional to na, where a is typically between 1.8 and 2.1. SVMs have been trained on data sets with several thousand instances, but Web directories today contain millions of instances which are valuable for mapping billions of Web pages into Yahoo!-like directories. We present SIMPL, a nearly linear-time classification algorithm which mimics the strengths of SVMs while avoiding the training bottleneck. It uses Fisher's linear discriminant, a classical tool from statistical pattern recognition, to project training instances to a carefully selected low-dimensional subspace before inducing a decision tree on the projected instances. SIMPL uses efficient sequential scans and sorts, and is comparable in speed and memory scalability to widely-used naive Bayes (NB) classifiers, but it beats NB accuracy decisively. It not only approaches and sometimes exceeds SVM accuracy, but also beats SVM running time by orders of magnitude. While developing SIMPL, we also make a detailed experimental analysis of the cache performance of SVMs.