Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The nature of statistical learning theory
The nature of statistical learning theory
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Blocking Reduction Strategies in Hierarchical Text Classification
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Hi-index | 0.00 |
In the automated text classification, tfidf is often considered as the default term weighting scheme and has been widely reported in literature. However, tfidf does not directly reflect terms' category membership. Inspired by the analysis of various feature selection methods, we propose a simple probability based term weighting scheme which directly utilizes two critical information ratios, i.e. relevance indicators. These relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study based on two data sets, including Reuters-21578, demonstrates that the proposed probability based term weighting scheme outperforms tfidf significantly using Bayesian classifier and Support Vector Machines (SVM).