MMR-based feature selection for text categorization

Authors:
Changki Lee;Gary Geunbae Lee
Affiliations:
Pohang University of Science & Technology, Hyoja-Dong, Pohang, South Korea;Pohang University of Science & Technology, Hyoja-Dong, Pohang, South Korea
Venue:
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Year:
2004

Citing 10
Cited 2

Some inconsistencies and misnomers in probabilistic information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A statistical learning learning model of text classification for support vector machines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using machine learning to improve information access

Using machine learning to improve information access

An effective feature selection method for text categorization

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
The Effect of Stemming on Arabic Text Classification: An Empirical Study

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a new method of feature selection for text categorization. Our MMR-based feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results show that MMR-based feature selection is more effective than Koller & Sahami's method, which is one of greedy feature selection methods, and conventional information gain which is commonly used in feature selection for text categorization. Moreover, MMR-based feature selection sometimes produces some improvements of conventional machine learning algorithms over SVM which is known to give the best classification accuracy.