Improving performance of text categorization by combining filtering and support vector machines: Research Articles

Authors:
Irene Díaz;José Ranilla;Elena Montañes;Javier Fernández;Elías F. Combarro
Affiliations:
Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain
Venue:
Journal of the American Society for Information Science and Technology
Year:
2004

Citing 0
Cited 14

Scoring and Selecting Terms for Text Categorization

IEEE Intelligent Systems
Introducing a Family of Linear Measures for Feature Selection in Text Categorization

IEEE Transactions on Knowledge and Data Engineering
Angular measures for feature selection in text categorization

Proceedings of the 2006 ACM symposium on Applied computing
A study of local and global thresholding techniques in text categorization

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
The Chinese text categorization system with association rule and category priority

Expert Systems with Applications: An International Journal
WORDS AS CLASSIFIERS OF DOCUMENTS ACCORDING TO THEIR HISTORICAL PERIOD AND THE ETHNIC ORIGIN OF THEIR AUTHORS

Cybernetics and Systems
Using Laplace and angular measures for Feature Selection in Text Categorisation

International Journal of Advanced Intelligence Paradigms
Set Cover Feature Selection for Text Categorisation and spam detection

International Journal of Advanced Intelligence Paradigms
A framework of automatic subject term assignment for text categorization: An indexing conception-based approach

Journal of the American Society for Information Science and Technology
Cuisine: Classification using stylistic feature sets and-or name-based feature sets

Journal of the American Society for Information Science and Technology
A framework for diagnosis of urinary incontinence disease based on scoring measures and automatic classifiers

Computers in Biology and Medicine
Using thesaurus to improve multiclass text classification

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
A multi-class SVM classification system based on learning methods from indistinguishable chinese official documents

Expert Systems with Applications: An International Journal
Identifying historical period and ethnic origin of documents using stylistic feature sets

DS'06 Proceedings of the 9th international conference on Discovery Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text Categorization is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time-consuming task. Several different algorithms have been applied, and Support Vector Machines (SVM) have shown very good results. In this report, we try to prove that a previous filtering of the words used by SVM in the classification can improve the overall performance. This hypothesis is systematically tested with three different measures of word relevance, on two different corpus (one of them considered in three different splits), and with both local and global vocabularies. The results show that filtering significantly improves the recall of the method, and that also has the effect of significantly improving the overall performance. © 2005 Wiley Periodicals, Inc.