A Boolean function approach to feature selection in consistent decision information systems

  • Authors:
  • Sirzat Kahramanli;Mehmet Hacibeyoglu;Ahmet Arslan

  • Affiliations:
  • Department of Computer Engineering, Selcuk University, Konya, Turkey;Department of Computer Engineering, Selcuk University, Konya, Turkey;Department of Computer Engineering, Selcuk University, Konya, Turkey

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

The goal of feature selection (FS) is to find the minimal subset (MS) R of condition feature set C such that R has the same classification power as C and then reduce the dataset by discarding from it all features not contained in R. Usually one dataset may have a lot of MSs and finding all of them is known as an NP-hard problem. Therefore, when only one MS is required, some heuristic for finding only one or a small number of possible MSs is used. But in this case there is a risk that the best MSs would be overlooked. When the best solution of an FS task is required, the discernibility matrix (DM)-based approach, generating all MSs, is used. There are basically two factors that often cause to overflow the computer's memory due to which the DM-based FS programs fail. One of them is the largeness of sizes of discernibility functions (DFs) for large data sets; the other is the intractable space complexity of the conversion of a DF to disjunctive normal form (DNF). But usually most of the terms of DF and temporary results generated during DF to DNF conversion process are redundant ones. Therefore, usually the minimized DF (DF"m"i"n) and the final DNF is to be much simpler than the original DF and temporary results mentioned, respectively. Based on these facts, we developed a logic function-based feature selection method that derives DF"m"i"n from the truth table image of a dataset and converts it to DNF with preventing the occurrences of redundant terms. The proposed method requires no more amount of memory than that is required for constructing DF"m"i"n and final DNF separately. Due to this property, it can process most of datasets that can not be processed by DM-based programs.