Communications of the ACM
Information Processing Letters
Cause-effect relationships and partially defined Boolean functions
Annals of Operations Research
A modeling language for mathematical programming
Management Science
Computational learning theory: an introduction
Computational learning theory: an introduction
C4.5: programs for machine learning
C4.5: programs for machine learning
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Learning Boolean concepts in the presence of many irrelevant features
Artificial Intelligence
Learning in the presence of finitely or infinitely many irrelevant attributes
Journal of Computer and System Sciences
Randomized algorithms
A threshold of ln n for approximating set cover (preliminary version)
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Logical analysis of numerical data
Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Error-free and best-fit extensions of partially defined Boolean functions
Information and Computation
Logical analysis of binary data with missing bits
Artificial Intelligence
Chow Parameters in Threshold Logic
Journal of the ACM (JACM)
The budgeted maximum coverage problem
Information Processing Letters
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
An Implementation of Logical Analysis of Data
IEEE Transactions on Knowledge and Data Engineering
Machine Learning
Machine Learning
Machine Learning
A Monotonic Measure for Optimal Feature Selection
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Selection Via Mathematical Programming
INFORMS Journal on Computing
Approximating Dense Cases of Covering Problems
Approximating Dense Cases of Covering Problems
Discrete Applied Mathematics - Discrete mathematics & data mining (DM & DM)
Performance analysis of a greedy algorithm for inferring boolean functions
Information Processing Letters
Performance analysis of a greedy algorithm for inferring Boolean functions
Information Processing Letters
An Improved Branch-and-Bound Method for Maximum Monomial Agreement
INFORMS Journal on Computing
When does greedy learning of relevant attributes succeed?: a fourier-based characterization
COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics
Hi-index | 0.00 |
We consider data sets that consist of n-dimensional binary vectors representing positive and negative examples for some (possibly unknown) phenomenon. A subset S of the attributes (or variables) of such a data set is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we study the problem of finding small support sets, a frequently arising task in various fields, including knowledge discovery, data mining, learning theory, logical analysis of data, etc. We study the distribution of support sets in randomly generated data, and discuss why finding small support sets is important. We propose several measures of separation (real valued set functions over the subsets of attributes), formulate optimization models for finding the smallest subsets maximizing these measures, and devise efficient heuristic algorithms to solve these (typically NP-hard) optimization problems. We prove that several of the proposed heuristics have a guaranteed constant approximation ratio, and we report on computational experience comparing these heuristics with some others from the literature both on randomly generated and on real world data sets.