A technique for improving the performance of naive bayes text classification

  • Authors:
  • Yuqian Jiang;Huaizhong Lin;Xuesong Wang;Dongming Lu

  • Affiliations:
  • College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

  • Venue:
  • WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Naive Bayes classifier is widely used in text classification tasks, and it can perform surprisingly well, it is often regarded as a baseline. But previous researches show that the skewed distribution of training collection may cause poor results in text classification. This paper presents a new method to deal with this situation. We introduce a conditional probability which takes into account both the information of the whole corpus and each category. Our proposed method performs well in the standard benchmark collections, competing with the state-of-the-art text classifiers especially for the skewed data.