A term weighting approach for text categorization

Authors:
Kyung-Chan Lee;Seung-Shik Kang;Kwang-Soo Hahn
Affiliations:
School of Computer Science, Kookmin University & AITrc, Seoul, Korea;School of Computer Science, Kookmin University & AITrc, Seoul, Korea;School of Computer Science, Kookmin University & AITrc, Seoul, Korea
Venue:
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Year:
2005

Citing 11
Cited 1

The probability ranking principle in IR

Readings in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A study of thresholding strategies for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On feature distributional clustering for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A meta-learning approach for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Topic difference factor extraction between two document sets and its application to text categorization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Meaningful term extraction and discriminative term selection in text categorization via unknown-word methodology

ACM Transactions on Asian Language Information Processing (TALIP)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Using asymmetric distributions to improve text classifier probability estimates

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Classification of textual E-mail spam using data mining techniques

Applied Computational Intelligence and Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is common that representative words in a document are identified and discriminated by their statistical distribution of their frequency statistics. We assume that evaluating the confidence measure of terms through content-based document analysis leads to a better performance than the parametric assumptions of the standard frequency-based method. In this paper, we propose a new approach of term weighting method that replaces the frequency-based probabilistic methods. Experiments on Naïve Bayesian classifiers showed that our approach achieved an improvement compared to the frequency-based method on each point of the evaluation.