Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On the precise number of (0,1)-matrices in U(R,S)
Discrete Mathematics
Pruning and summarizing the discovered associations
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using association rules for product assortment decisions: a case study
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical bayes screening for multi-item associations
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying non-actionable association rules
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate counting by dynamic programming
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Computational complexity of itemset frequency satisfiability
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling binary contingency tables with a greedy start
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Significant Patterns
Machine Learning
Randomization Techniques for Data Mining Methods
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Tell me something I don't know: randomization strategies for iterative data mining
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for mining interesting pattern sets
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Using background knowledge to rank itemsets
Data Mining and Knowledge Discovery
Permutation Tests for Studying Classifier Performance
The Journal of Machine Learning Research
Preservation of statistically significant patterns in multiresolution 0-1 data
PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Summarising data by clustering items
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Proceedings of the 14th International Conference on Extending Database Technology
A framework for mining interesting pattern sets
ACM SIGKDD Explorations Newsletter
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
An information theoretic framework for data mining
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Tell me what i need to know: succinctly summarizing data with itemsets
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Algorithms for Chains
The Journal of Machine Learning Research
Maximum entropy models and subjective interestingness: an application to tiles in binary databases
Data Mining and Knowledge Discovery
Multiple hypothesis testing in pattern discovery
DS'11 Proceedings of the 14th international conference on Discovery science
Gene selection in time-series gene expression data
PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Approaches to the selection of relevant concepts in the case of noisy data
ICFCA'10 Proceedings of the 8th international conference on Formal Concept Analysis
Testing the significance of spatio-temporal teleconnection patterns
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing data succinctly with the most informative itemsets
ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Knowledge discovery interestingness measures based on unexpectedness
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
Summarizing categorical data by clustering attributes
Data Mining and Knowledge Discovery
An effective and efficient parallel approach for random graph generation over GPUs
Journal of Parallel and Distributed Computing
One-mode Projection of Multiplex Bipartite Graphs
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
A statistical significance testing approach to mining the most informative set of patterns
Data Mining and Knowledge Discovery
Interesting pattern mining in multi-relational data
Data Mining and Knowledge Discovery
Compass: A hybrid method for clinical and biobank data mining
Journal of Biomedical Informatics
Hi-index | 0.00 |
The problem of assessing the significance of data mining results on high-dimensional 0--1 datasets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by standard statistical tests such as chi-square, or other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are difficult to apply to sets of patterns or other complex results of data mining algorithms. In this article, we consider a simple randomization technique that deals with this shortcoming. The approach consists of producing random datasets that have the same row and column margins as the given dataset, computing the results of interest on the randomized instances and comparing them to the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, and spectral analysis. To generate random datasets with given margins, we use variations of a Markov chain approach which is based on a simple swap operation. We give theoretical results on the efficiency of different randomization methods, and apply the swap randomization method to several well-known datasets. Our results indicate that for some datasets the structure discovered by the data mining algorithms is expected, given the row and column margins of the datasets, while for other datasets the discovered structure conveys information that is not captured by the margin counts.