A comparison of the performance of SVM and ARNI on Text Categorization with new filtering measures on an unbalanced collection

Authors:
Elías F. Combarro;Elena Montañés;José Ranilla;Javier Fernández
Affiliations:
Computer Science Department, University of Oviedo, Gijón(Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Gijón(Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Gijón(Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Gijón(Asturias), Spain
Venue:
IWANN '03 Proceedings of the 7th International Work-Conference on Artificial and Natural Neural Networks: Part II: Artificial Neural Nets Problem Solving Methods
Year:
2009

Citing 9
Cited 1

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
The nature of statistical learning theory

The nature of statistical learning theory
Unifying instance-based and rule-based induction

Machine Learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
The CN2 Induction Algorithm

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A study of instance-based algorithms for supervised learning tasks: mathematical, empirical, and psychological evaluations

A study of instance-based algorithms for supervised learning tasks: mathematical, empirical, and psychological evaluations

A framework for diagnosis of urinary incontinence disease based on scoring measures and automatic classifiers

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text Categorization (TC) is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time-consuming task due to the great amount of information available. Machine Learning (ML) algorithms are methods recently applied with this purpose. In this paper, we compare the performance of two of these algorithms (SVM and ARNI) on a collection with an unbalanced distribution of documents into categories. Feature reduction is previously applied with both classical measures (information gain and term frequency) and 3 new measures that we propose here for first time. We also compare their performance.