An anti-noise text categorization method based on support vector machines

Authors:
Lin Chen;Jie Huang;Zheng-Hu Gong
Affiliations:
School of Computer Science, National University of Defense Technology, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, China
Venue:
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Year:
2005

Citing 6
Cited 1

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
FANNC: a fast adaptive neural network classifier

Knowledge and Information Systems
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Support Vector Machines for Text Categorization

HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 4 - Volume 4

Raising the baseline for high-precision text classifiers

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorization has become one of the key techniques for handling and organizing web data. Though the native features of SVM (Support Vector Machines) are better than Naive Bayes' for text categorization in theory, the classification precision of SVM is lower than Bayesian method in real world. This paper tries to find out the mysteries by analyzing the shortages of SVM, and presents an anti-noise SVM method. The improved method has two characteristics: 1) It chooses the optimal n-dimension classifying hyperspace. 2) It separates noise samples by preprocessing, and trains the classifier using noise free samples. Compared with naive Bayes method, the classification precision of anti-noise SVM is increased about 3 to 9 percent.