A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A novel feature selection algorithm for text categorization
Expert Systems with Applications: An International Journal
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Comparative experiments on sentiment classification for online product reviews
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Feature subsumption for opinion analysis
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A study on optimal parameter tuning for Rocchio text classifier
ECIR'03 Proceedings of the 25th European conference on IR research
Enhancement of DTP feature selection method for text categorization
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Sentiment classification and polarity shifting
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Exploring the use of word relation features for sentiment classification
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A symbolic approach for text classification based on dissimilarity measure
Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia
Macro features based text categorization
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Cross-lingual text classification with model translation and document translation
Proceedings of the 50th Annual Southeast Regional Conference
Unsupervised topic detection model and its application in text categorization
Proceedings of the CUBE International Information Technology Conference
Document-level sentiment classification: An empirical comparison between SVM and ANN
Expert Systems with Applications: An International Journal
Feature selection based on term frequency and T-test for text categorization
Proceedings of the 21st ACM international conference on Information and knowledge management
Comparison of text feature selection policies and using an adaptive framework
Expert Systems with Applications: An International Journal
A study of supervised term weighting scheme for sentiment analysis
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
In text categorization, feature selection (FS) is a strategy that aims at making text classifiers more efficient and accurate. However, when dealing with a new task, it is still difficult to quickly select a suitable one from various FS methods provided by many previous studies. In this paper, we propose a theoretic framework of FS methods based on two basic measurements: frequency measurement and ratio measurement. Then six popular FS methods are in detail discussed under this framework. Moreover, with the guidance of our theoretical analysis, we propose a novel method called weighed frequency and odds (WFO) that combines the two measurements with trained weights. The experimental results on data sets from both topic-based and sentiment classification tasks show that this new method is robust across different tasks and numbers of selected features.