Scoring and Selecting Terms for Text Categorization
IEEE Intelligent Systems
Introducing a Family of Linear Measures for Feature Selection in Text Categorization
IEEE Transactions on Knowledge and Data Engineering
Angular measures for feature selection in text categorization
Proceedings of the 2006 ACM symposium on Applied computing
A study of local and global thresholding techniques in text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
The Chinese text categorization system with association rule and category priority
Expert Systems with Applications: An International Journal
Using Laplace and angular measures for Feature Selection in Text Categorisation
International Journal of Advanced Intelligence Paradigms
Set Cover Feature Selection for Text Categorisation and spam detection
International Journal of Advanced Intelligence Paradigms
Journal of the American Society for Information Science and Technology
Cuisine: Classification using stylistic feature sets and-or name-based feature sets
Journal of the American Society for Information Science and Technology
Computers in Biology and Medicine
Using thesaurus to improve multiclass text classification
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Expert Systems with Applications: An International Journal
Identifying historical period and ethnic origin of documents using stylistic feature sets
DS'06 Proceedings of the 9th international conference on Discovery Science
Hi-index | 0.00 |
Text Categorization is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time-consuming task. Several different algorithms have been applied, and Support Vector Machines (SVM) have shown very good results. In this report, we try to prove that a previous filtering of the words used by SVM in the classification can improve the overall performance. This hypothesis is systematically tested with three different measures of word relevance, on two different corpus (one of them considered in three different splits), and with both local and global vocabularies. The results show that filtering significantly improves the recall of the method, and that also has the effect of significantly improving the overall performance. © 2005 Wiley Periodicals, Inc.