Automatically computed document dependent weighting factor facility for Naïve Bayes classification

  • Authors:
  • Lam Hong Lee;Dino Isa

  • Affiliations:
  • Department of Knowledge Science, Faculty of Science, Engineering and Technology, Universiti Tunku Abdul Rahman, Perak Campus, Jalan Universiti, Bandar Barat, 31900 Kampar, Perak, Malaysia;Intelligent Systems Research Group, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 12.05

Visualization

Abstract

The Naive Bayes classification approach has been widely implemented in real-world applications due to its simplicity and low cost training and classifying algorithm. As a trade-off to its simplicity, the Naive Bayes technique has thus been reported to be one of the poorest-performing classification methods around. We have explored and investigated the Naive Bayes classification approach and found that one of the reasons that causes the low classification accuracy is the mis-classification of documents into several ''popular'' categories due to the improper organization of the training dataset where the distribution of training documents among categories is highly skewed. In this work, we propose a solution to the problem addressed above, which is the addition of the Automatically Computed Document Dependent (ACDD) weighting factor facility to the Naive Bayes classifier. The ACDD weighting factors are computed for the purpose of enhancing the classification performance by adjusting the probability values based on the density of classified documents in each available category to minimize the mis-classification rate.