Statistical analysis with missing data
Statistical analysis with missing data
Detecting change in categorical data: mining contrast sets
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting Group Differences: Mining Contrast Sets
Data Mining and Knowledge Discovery
Data Squashing by Empirical Likelihood
Data Mining and Knowledge Discovery
Characterizing Model Erros and Differences
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Speed-up Iterative Frequent Itemset Mining with Constraint Changes
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On detecting differences between groups
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
Learn++.MF: A random subspace approach for the missing feature problem
Pattern Recognition
Computational Statistics & Data Analysis
Hi-index | 0.10 |
Detecting differences between populations (or datasets) is an important research topic in machine learning, yet an common application means of evaluating, such as a new medical product by comparing with an old one. Previous researchers focus on change detection. In this paper, we measure the uncertainty of structural differences, such as mean and distribution function differences, between populations, using a confidence interval (CI), via an empirical likelihood approach. We present a statistically sound method for estimating CIs for differences between non-parametric populations with missing values, which are imputed by using simple random hot deck imputation method. We illustrate the power of CI estimation as a new machine learning technique for, such as, distinguishing spam from non-spam emails in spambase dataset downloaded from UCI.