Empirical likelihood confidence intervals for differences between two datasets with missing data

Authors:
Yongsong Qin;Shichao Zhang
Affiliations:
School of Computer Science and Information Technology Guangxi Normal University Guilin, 541004, PR China;School of Computer Science and Information Technology Guangxi Normal University Guilin, 541004, PR China
Venue:
Pattern Recognition Letters
Year:
2008

Citing 7
Cited 3

Statistical analysis with missing data

Statistical analysis with missing data
Detecting change in categorical data: mining contrast sets

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
Data Squashing by Empirical Likelihood

Data Mining and Knowledge Discovery
Characterizing Model Erros and Differences

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Speed-up Iterative Frequent Itemset Mining with Constraint Changes

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On detecting differences between groups

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Learn++.MF: A random subspace approach for the missing feature problem

Pattern Recognition
Empirical likelihood calibration estimation for the median treatment difference in observational studies

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.10

Visualization

Abstract

Detecting differences between populations (or datasets) is an important research topic in machine learning, yet an common application means of evaluating, such as a new medical product by comparing with an old one. Previous researchers focus on change detection. In this paper, we measure the uncertainty of structural differences, such as mean and distribution function differences, between populations, using a confidence interval (CI), via an empirical likelihood approach. We present a statistically sound method for estimating CIs for differences between non-parametric populations with missing values, which are imputed by using simple random hot deck imputation method. We illustrate the power of CI estimation as a new machine learning technique for, such as, distinguishing spam from non-spam emails in spambase dataset downloaded from UCI.