Empirical likelihood confidence intervals for differences between two datasets with missing data

  • Authors:
  • Yongsong Qin;Shichao Zhang

  • Affiliations:
  • School of Computer Science and Information Technology Guangxi Normal University Guilin, 541004, PR China;School of Computer Science and Information Technology Guangxi Normal University Guilin, 541004, PR China

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.10

Visualization

Abstract

Detecting differences between populations (or datasets) is an important research topic in machine learning, yet an common application means of evaluating, such as a new medical product by comparing with an old one. Previous researchers focus on change detection. In this paper, we measure the uncertainty of structural differences, such as mean and distribution function differences, between populations, using a confidence interval (CI), via an empirical likelihood approach. We present a statistically sound method for estimating CIs for differences between non-parametric populations with missing values, which are imputed by using simple random hot deck imputation method. We illustrate the power of CI estimation as a new machine learning technique for, such as, distinguishing spam from non-spam emails in spambase dataset downloaded from UCI.