Estimating confidence intervals for structural differences between contrast groups with missing data

Authors:
Yongsong Qin;Shichao Zhang;Xiaofeng Zhu;Jilian Zhang;Chengqi Zhang
Affiliations:
School of Computer Science and Information Technology, Guangxi Normal University, PR China;School of Computer Science and Information Technology, Guangxi Normal University, PR China and Faculty of Information Technology, University of Technology, Sydney, P.O. Box 123, Broadway NSW 2007, ...;School of Computer Science and Information Technology, Guangxi Normal University, PR China;School of Computer Science and Information Technology, Guangxi Normal University, PR China;Faculty of Information Technology, University of Technology, Sydney, P.O. Box 123, Broadway NSW 2007, Australia
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 10
Cited 1

Statistical analysis with missing data

Statistical analysis with missing data
Detecting change in categorical data: mining contrast sets

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
Data Squashing by Empirical Likelihood

Data Mining and Knowledge Discovery
Characterizing Model Erros and Differences

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Speed-up Iterative Frequent Itemset Mining with Constraint Changes

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On detecting differences between groups

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting Source Code Changes by Mining Change History

IEEE Transactions on Software Engineering
Mining changes in customer buying behavior for collaborative recommendations

Expert Systems with Applications: An International Journal
Mining changes in association rules: a fuzzy approach

Fuzzy Sets and Systems

An analysis on the use of pre-processing methods in evolutionary fuzzy systems for subgroup discovery

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Difference detection is actual and extremely useful for evaluating a new medicine B against a specified disease by comparing to an old medicine A, which has been used to treat the disease for many years. The datasets generated by applying A and B to the disease are called contrast groups and, main differences between the groups are the mean and distribution differences, referred to structural differences in this paper. However, contrast groups are only two samples obtained by limited applications or tests on A and B, and may be with missing values. Therefore, the differences derived from the groups are inevitably uncertain. In this paper, we propose a statistically sound approach for measuring this uncertainty by identifying the confidence intervals of structural differences between contrast groups. This method is designed significantly against most of those applications whose exact data distributions are unknown a priori, and the data may also be with missing values. We apply our approach to UCI datasets to illustrate its power as a new data mining technique for, such as, distinguishing spam from non-spam emails; and the benign breast cancer from the malign one.