An anti-noise text categorization method based on support vector machines

  • Authors:
  • Lin Chen;Jie Huang;Zheng-Hu Gong

  • Affiliations:
  • School of Computer Science, National University of Defense Technology, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, China

  • Venue:
  • AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text categorization has become one of the key techniques for handling and organizing web data. Though the native features of SVM (Support Vector Machines) are better than Naive Bayes' for text categorization in theory, the classification precision of SVM is lower than Bayesian method in real world. This paper tries to find out the mysteries by analyzing the shortages of SVM, and presents an anti-noise SVM method. The improved method has two characteristics: 1) It chooses the optimal n-dimension classifying hyperspace. 2) It separates noise samples by preprocessing, and trains the classifier using noise free samples. Compared with naive Bayes method, the classification precision of anti-noise SVM is increased about 3 to 9 percent.