A framework of feature selection methods for text categorization

Authors:
Shoushan Li;Rui Xia;Chengqing Zong;Chu-Ren Huang
Affiliations:
The Hong Kong Polytechnic University;Institute of Automation, Chinese Academy of Sciences;Institute of Automation, Chinese Academy of Sciences;The Hong Kong Polytechnic University
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Year:
2009

Citing 15
Cited 10

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A novel feature selection algorithm for text categorization

Expert Systems with Applications: An International Journal
Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Comparative experiments on sentiment classification for online product reviews

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Feature subsumption for opinion analysis

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A study on optimal parameter tuning for Rocchio text classifier

ECIR'03 Proceedings of the 25th European conference on IR research
Enhancement of DTP feature selection method for text categorization

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Sentiment classification and polarity shifting

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Exploring the use of word relation features for sentiment classification

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A symbolic approach for text classification based on dissimilarity measure

Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia
Macro features based text categorization

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Cross-lingual text classification with model translation and document translation

Proceedings of the 50th Annual Southeast Regional Conference
Unsupervised topic detection model and its application in text categorization

Proceedings of the CUBE International Information Technology Conference
Document-level sentiment classification: An empirical comparison between SVM and ANN

Expert Systems with Applications: An International Journal
Feature selection based on term frequency and T-test for text categorization

Proceedings of the 21st ACM international conference on Information and knowledge management
Comparison of text feature selection policies and using an adaptive framework

Expert Systems with Applications: An International Journal
A study of supervised term weighting scheme for sentiment analysis

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In text categorization, feature selection (FS) is a strategy that aims at making text classifiers more efficient and accurate. However, when dealing with a new task, it is still difficult to quickly select a suitable one from various FS methods provided by many previous studies. In this paper, we propose a theoretic framework of FS methods based on two basic measurements: frequency measurement and ratio measurement. Then six popular FS methods are in detail discussed under this framework. Moreover, with the guidance of our theoretical analysis, we propose a novel method called weighed frequency and odds (WFO) that combines the two measurements with trained weights. The experimental results on data sets from both topic-based and sentiment classification tasks show that this new method is robust across different tasks and numbers of selected features.