A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles
Journal of the American Society for Information Science and Technology
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
In this paper, a fast and effective text categorization method named TCBLDF is proposed. TCBLDF barely needs dimensionality reduction except a stop words removal and a document frequency based feature selection. It tries to capture the relationship between a term and a category label, thus eliminates the need to know the semantic contribution of a term makes to a document it occurs in. TCBLDF use a measure to evaluate the importance of each term for the categorization task, and then gives different weights to them according to the importance evaluations. By doing so, we can make important terms affect more when making classification decision. At last we compare the method to two conventional classification methods, a Naive Bayesian learning and a linear SVM learning method. Experimental results show that TCBLDF is faster than SVM with a comparable performance and more effective than Naive Bayes, thus can be a good alternative to these methods.