Statistical analysis with missing data
Statistical analysis with missing data
Detecting change in categorical data: mining contrast sets
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting Group Differences: Mining Contrast Sets
Data Mining and Knowledge Discovery
Data Squashing by Empirical Likelihood
Data Mining and Knowledge Discovery
Characterizing Model Erros and Differences
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Speed-up Iterative Frequent Itemset Mining with Constraint Changes
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On detecting differences between groups
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting Source Code Changes by Mining Change History
IEEE Transactions on Software Engineering
Mining changes in customer buying behavior for collaborative recommendations
Expert Systems with Applications: An International Journal
Mining changes in association rules: a fuzzy approach
Fuzzy Sets and Systems
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Difference detection is actual and extremely useful for evaluating a new medicine B against a specified disease by comparing to an old medicine A, which has been used to treat the disease for many years. The datasets generated by applying A and B to the disease are called contrast groups and, main differences between the groups are the mean and distribution differences, referred to structural differences in this paper. However, contrast groups are only two samples obtained by limited applications or tests on A and B, and may be with missing values. Therefore, the differences derived from the groups are inevitably uncertain. In this paper, we propose a statistically sound approach for measuring this uncertainty by identifying the confidence intervals of structural differences between contrast groups. This method is designed significantly against most of those applications whose exact data distributions are unknown a priori, and the data may also be with missing values. We apply our approach to UCI datasets to illustrate its power as a new data mining technique for, such as, distinguishing spam from non-spam emails; and the benign breast cancer from the malign one.