Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Computational learning theory: an introduction
Computational learning theory: an introduction
Elements of information theory
Elements of information theory
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules
Advances in knowledge discovery and data mining
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Learning with mixtures of trees
The Journal of Machine Learning Research
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Assessing data mining results via swap randomization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximally informative k-itemsets and their efficient discovery
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A Projection Pursuit Algorithm for Exploratory Data Analysis
IEEE Transactions on Computers
Don't be afraid of simpler patterns
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Finding trees from unordered 0–1 data
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Compression picks item sets that matter
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Maximum entropy based significance of itemsets
Knowledge and Information Systems
Mining non-redundant high order correlations in binary data
Proceedings of the VLDB Endowment
An Improved Algorithm for Mining Non-Redundant Interacting Feature Subsets
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering highly informative feature sets from data streams
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Mining non-redundant information-theoretic dependencies between itemsets
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Summarising data by clustering items
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
Summarizing categorical data by clustering attributes
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
The discovery of subsets with special properties from binary data hasbeen one of the key themes in pattern discovery. Pattern classes suchas frequent itemsets stress the co-occurrence of the value 1 in the data. While this choice makes sense in the context of sparse binary data, it disregards potentially interesting subsets of attributes that have some other type of dependency structure. We consider the problem of finding all subsets of attributes that have low complexity. The complexity is measured by either the entropy of the projection of the data on the subset, or the entropy of the data for the subset when modeled using a Bayesian tree, with downward or upward pointing edges. We show that the entropy measure on sets has a monotonicity property, and thus a levelwise approach can find all low-entropy itemsets. We also show that the tree-based measures are bounded above by the entropy of the corresponding itemset, allowing similar algorithms to be used for finding low-entropy trees. We describe algorithms for finding all subsets satisfying an entropy condition. We give an extensive empirical evaluation of the performance of the methods both on synthetic and on real data. We also discuss the search for high-entropy subsets and the computation of the Vapnik-Chervonenkis dimension of the data.