OFFD: Optimal Flexible Frequency Discretization for Naïve Bayes Classification

  • Authors:
  • Song Wang;Fan Min;Zhihai Wang;Tianyu Cao

  • Affiliations:
  • Department of Computer Science, The University of Vermont, Burlington, USA 05405;Department of Computer Science, The University of Vermont, Burlington, USA 05405 and Department of Computer Science and Engineering, University of Electronic Science and Technology of China, Sichu ...;School of Computer Science and Information Technology, Beijing Jiaotong University, Beijing, China 100044;Department of Computer Science, The University of Vermont, Burlington, USA 05405

  • Venue:
  • ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Incremental Flexible Frequency Discretization (IFFD) is a recently proposed discretization approach for Naïve Bayes (NB). IFFD performs satisfactory by setting the minimal interval frequency for discretized intervals as a fixed number. In this paper, we first argue that this setting cannot guarantee optimal classification performance in terms of classification error. We observed empirically that an optimal minimal interval frequency existed for each dataset. We thus proposed a sequential search and wrapper based incremental discretization method for NB: named Optimal Flexible Frequency Discretization (OFFD). Experiments were conducted on 17 datasets from UCI machine learning repository and performance was compared between NB trained on the data discretized by OFFD, IFFD, PKID, and FFD respectively. Results show that OFFD works better than these alternatives for NB. Experiments between NB discretized on the data with OFFD and C4.5 showed that our new method outperforms C4.5 on most of the datasets we have tested.