Estimating confidence intervals for structural differences between contrast groups with missing data

  • Authors:
  • Yongsong Qin;Shichao Zhang;Xiaofeng Zhu;Jilian Zhang;Chengqi Zhang

  • Affiliations:
  • School of Computer Science and Information Technology, Guangxi Normal University, PR China;School of Computer Science and Information Technology, Guangxi Normal University, PR China and Faculty of Information Technology, University of Technology, Sydney, P.O. Box 123, Broadway NSW 2007, ...;School of Computer Science and Information Technology, Guangxi Normal University, PR China;School of Computer Science and Information Technology, Guangxi Normal University, PR China;Faculty of Information Technology, University of Technology, Sydney, P.O. Box 123, Broadway NSW 2007, Australia

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.05

Visualization

Abstract

Difference detection is actual and extremely useful for evaluating a new medicine B against a specified disease by comparing to an old medicine A, which has been used to treat the disease for many years. The datasets generated by applying A and B to the disease are called contrast groups and, main differences between the groups are the mean and distribution differences, referred to structural differences in this paper. However, contrast groups are only two samples obtained by limited applications or tests on A and B, and may be with missing values. Therefore, the differences derived from the groups are inevitably uncertain. In this paper, we propose a statistically sound approach for measuring this uncertainty by identifying the confidence intervals of structural differences between contrast groups. This method is designed significantly against most of those applications whose exact data distributions are unknown a priori, and the data may also be with missing values. We apply our approach to UCI datasets to illustrate its power as a new data mining technique for, such as, distinguishing spam from non-spam emails; and the benign breast cancer from the malign one.