Annual review of information science and technology, vol. 22
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning
Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Extraction, Construction and Selection: A Data Mining Perspective
Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
IMMC: incremental maximum margin criterion
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Margin based feature selection - theory and algorithms
ICML '04 Proceedings of the twenty-first international conference on Machine learning
A theoretical characterization of linear SVM-based feature selection
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature selection with conditional mutual information maximin in text categorization
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Generalizing discriminant analysis using the generalized singular value decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Effective and Efficient Dimensionality Reduction for Large-Scale and Streaming Data Preprocessing
IEEE Transactions on Knowledge and Data Engineering
Adapting association patterns for text categorization: weaknesses and enhancements
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Feature selection in a kernel space
Proceedings of the 24th international conference on Machine learning
On the strength of hyperclique patterns for text categorization
Information Sciences: an International Journal
Using ambiguity measure feature selection algorithm for support vector machine classifier
Proceedings of the 2008 ACM symposium on Applied computing
Iterative Search for Similar Documents on Mobile Devices
KI '08 Proceedings of the 31st annual German conference on Advances in Artificial Intelligence
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Cascaded search for similar documents between mobile devices
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Semi-supervised orthogonal discriminant analysis via label propagation
Pattern Recognition
A General Framework of Feature Selection for Text Categorization
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A probabilistic model for compact document topic representation
SMO'09 Proceedings of the 9th WSEAS international conference on Simulation, modelling and optimization
Feature reduction techniques for Arabic text categorization
Journal of the American Society for Information Science and Technology
Maximum entropy modeling with feature selection for text categorization
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Orthogonal Complete Discriminant Locality Preserving Projections for Face Recognition
Neural Processing Letters
An effective feature selection method for text categorization
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Using main content extraction to improve performance of Vietnamese web page classification
Proceedings of the Second Symposium on Information and Communication Technology
Comparison of feature selection methods for sentiment analysis
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Information Processing and Management: an International Journal
Sentiment classification with supervised sequence embedding
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Hi-index | 0.01 |
Text categorization is an important research area in many Information Retrieval (IR) applications. To save the storage space and computation time in text categorization, efficient and effective algorithms for reducing the data before analysis are highly desired. Traditional techniques for this purpose can generally be classified into feature extraction and feature selection. Because of efficiency, the latter is more suitable for text data such as web documents. However, many popular feature selection techniques such as Information Gain (IG) andχ2-test (CHI) are all greedy in nature and thus may not be optimal according to some criterion. Moreover, the performance of these greedy methods may be deteriorated when the reserved data dimension is extremely low. In this paper, we propose an efficient optimal feature selection algorithm by optimizing the objective function of Orthogonal Centroid (OC) subspace learning algorithm in a discrete solution space, called Orthogonal Centroid Feature Selection (OCFS). Experiments on 20 Newsgroups (20NG), Reuters Corpus Volume 1 (RCV1) and Open Directory Project (ODP) data show that OCFS is consistently better than IG and CHI with smaller computation time especially when the reduced dimension is extremely small.