Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of search term weighting: term relevance vs. inverse document frequency
SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Feature selection using linear classifier weights: interaction with classification models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information gain ratio as term weight: the case of summarization of IR results
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Feature generation and representations for protein-protein interaction classification
Journal of Biomedical Informatics
Mining linguistic cues for query expansion: applications to drug interaction search
Proceedings of the 18th ACM conference on Information and knowledge management
Term weighting evaluation in bipartite partitioning for text clustering
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
A negative category based approach for Wikipedia document classification
International Journal of Knowledge Engineering and Data Mining
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Automatic augmentation of links in web browsing
International Journal of Information and Communication Technology
Text representation in multi-label classification: two new input representations
ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
A study of term weighting schemes using class information for text classification
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weighting scheme, i.e. tf.rf, and investigate several widely-used unsupervised and supervised term weighting methods on two popular data collections in combination with SVM and kNN algorithms. From our controlled experimental results, not all supervised term weighting methods have a consistent superiority over unsupervised term weighting methods. Specifically, the three supervised methods based on the information theory, i.e. tf.χ2, tf.ig and tf.or, perform rather poorly in all experiments. On the other hand, our proposed tf.rf achieves the best performance consistently and outperforms other methods substantially and significantly. The popularly-used tf.idf method has not shown a uniformly good performance with respect to different data corpora.