Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On the precise number of (0,1)-matrices in U(R,S)
Discrete Mathematics
Pruning and summarizing the discovered associations
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using association rules for product assortment decisions: a case study
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate counting by dynamic programming
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding low-entropy sets and trees from binary data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Nestedness and segmented nestedness
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic annotation of frequent patterns
ACM Transactions on Knowledge Discovery from Data (TKDD)
Privacy Preserving Market Basket Data Analysis
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Finding Outlying Items in Sets of Partial Rankings
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
An efficient rigorous approach for identifying statistically significant frequent itemsets
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Motif discovery in physiological datasets: A methodology for inferring predictive elements
ACM Transactions on Knowledge Discovery from Data (TKDD)
Actionability and formal concepts: a data mining perspective
ICFCA'08 Proceedings of the 6th international conference on Formal concept analysis
Assessing and ranking structural correlations in graphs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Generating random graphic sequences
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Inductive databases and constraint-based data mining
ICFCA'11 Proceedings of the 9th international conference on Formal concept analysis
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets
Journal of the ACM (JACM)
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A framework for evaluating the smoothness of data-mining results
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Hi-index | 0.00 |
The problem of assessing the significance of data mining results on high-dimensional 0-1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by, e.g., chi-square tests, or many other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are more difficult to apply to sets of patterns or other complex results of data mining. In this paper, we consider a simple randomization technique that deals with this shortcoming. The approach consists of producing random datasets that have the same row and column margins with the given dataset, computing the results of interest on the randomized instances, and comparing them against the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, and rankings. To generate random datasets with given margins, we use variations of a Markov chain approach, which is based on a simple swap operation. We give theoretical results on the efficiency of different randomization methods, and apply the swap randomization method to several well-known datasets. Our results indicate that for some datasets the structure discovered by the data mining algorithms is a random artifact, while for other datasets the discovered structure conveys meaningful information.