C4.5: programs for machine learning
C4.5: programs for machine learning
Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Data mining, hypergraph transversals, and machine learning (extended abstract)
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Separate-and-Conquer Rule Learning
Artificial Intelligence Review
New methods to color the vertices of a graph
Communications of the ACM
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
Data Mining and Knowledge Discovery
R-MINI: An Iterative Approach for Generating Minimal Rules from Examples
IEEE Transactions on Knowledge and Data Engineering
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Mining Top.K Frequent Closed Patterns without Minimum Support
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Approximating a collection of frequent sets
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns: a profile-based approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Hyper-rectangle-based discriminative data generalization and applications in data mining
Hyper-rectangle-based discriminative data generalization and applications in data mining
Rule Extraction and Reduction for Hyper Surface Classification
ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
SPARCL: an effective and efficient algorithm for mining arbitrary shape-based clusters
Knowledge and Information Systems
Summarizing transactional databases with overlapped hyperrectangles
Data Mining and Knowledge Discovery
Reducing test cost of integrated, heterogeneous systems using pass-fail test data analysis
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
In this paper, we introduce and study the Minimum Consistent Subset Cover (MCSC) problem. Given a finite ground set X and a constraint t, find the minimum number of consistent subsets that cover X, where a subset of X is consistent if it satisfies t. The MCSC problem generalizes the traditional set covering problem and has Minimum Clique Partition, a dual problem of graph coloring, as an instance. Many practical data mining problems in the areas of rule learning, clustering, and frequent pattern mining can be formulated as MCSC instances. In particular, we discuss the Minimum Rule Set problem that minimizes model complexity of decision rules as well as some converse k-clustering problems that minimize the number of clusters satisfying certain distance constraints. We also show how the MCSC problem can find applications in frequent pattern summarization. For any of these MCSC formulations, our proposed novel graph-based generic algorithm CAG can be directly applicable. CAG starts by constructing a maximal optimal partial solution, then performs an example-driven specific-to-general search on a dynamically maintained bipartite assignment graph to simultaneously learn a set of consistent subsets with small cardinality covering the ground set. Our experiments on benchmark datasets show that CAG achieves good results compared to existing popular heuristics.