Jacobi's method is more accurage than QR
SIAM Journal on Matrix Analysis and Applications
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Least Squares Support Vector Machine Classifiers
Neural Processing Letters
Proximal support vector machine classifiers
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Shrinkage estimator generalizations of Proximal Support Vector Machines
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating feature and instance selection for text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Combinatorial feature selection problems
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Robustness of regularized linear classification methods in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Everything old is new again: a fresh look at historical approaches in machine learning
Everything old is new again: a fresh look at historical approaches in machine learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
ICML '04 Proceedings of the twenty-first international conference on Machine learning
SVM vs Regularized Least Squares Classification
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Dimension Reduction in Text Classification with Support Vector Machines
The Journal of Machine Learning Research
Sampling algorithms for l2 regression and applications
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Sampling algorithms and coresets for ℓp regression
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
A bottom-up approach for XML documents classification
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Feature selection with dynamic mutual information
Pattern Recognition
A General Framework of Feature Selection for Text Categorization
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Differential Tag Clouds: Highlighting Particular Features in Documents
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Rank Aggregation Based Text Feature Selection
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Classifying Documents According to Locational Relevance
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Mining positive and negative patterns for relevance feature discovery
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Cross-lingual text classification with model translation and document translation
Proceedings of the 50th Annual Southeast Regional Conference
Representation models for text classification: a comparative analysis over three web document types
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
A novel feature selection method based on normalized mutual information
Applied Intelligence
Comparison of text feature selection policies and using an adaptive framework
Expert Systems with Applications: An International Journal
SVOIS: Support Vector Oriented Instance Selection for text classification
Information Systems
Evolutionary instance selection for text classification
Journal of Systems and Software
Hi-index | 0.00 |
We consider feature selection for text classification both theoretically and empirically. Our main result is an unsupervised feature selection strategy for which we give worst-case theoretical guarantees on the generalization power of the resultant classification function f with respect to the classification function f obtained when keeping all the features. To the best of our knowledge, this is the first feature selection method with such guarantees. In addition, the analysis leads to insights as to when and why this feature selection strategy will perform well in practice. We then use the TechTC-100, 20-Newsgroups, and Reuters-RCV2 data sets to evaluate empirically the performance of this and two simpler but related feature selection strategies against two commonly-used strategies. Our empirical evaluation shows that the strategy with provable performance guarantees performs well in comparison with other commonly-used feature selection strategies. In addition, it performs better on certain datasets under very aggressive feature selection.