Bootstrap FDA for counting positives accurately in imprecise environments

Authors:
Jigang Xie;Zhengding Qiu;Zhenjiang Miao;Yanqiang Zhang
Affiliations:
Institute of Information Science, Beijing Jiaotong University, Beijing 100044, PR China and Center of China Merger and Acquisition Research, Beijing Jiaotong University, Beijing 100044, PR China;Institute of Information Science, Beijing Jiaotong University, Beijing 100044, PR China;Institute of Information Science, Beijing Jiaotong University, Beijing 100044, PR China;Institute of Information Science, Beijing Jiaotong University, Beijing 100044, PR China
Venue:
Pattern Recognition
Year:
2007

Citing 5
Cited 1

Applied multivariate statistical analysis

Applied multivariate statistical analysis
Robust Classification for Imprecise Environments

Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
The theoretical analysis of FDA and applications

Pattern Recognition
Counting positives accurately despite inaccurate classification

ECML'05 Proceedings of the 16th European conference on Machine Learning

An asymmetric classifier based on partial least squares

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many real-world classification tasks involve discriminations between two unbalanced classes in imprecise environments, in which either the training data do not represent a random sample of the target population or the class distribution may shift over time in the target population. In such situations, in order to minimize the misclassification costs, the class distribution in target population must be known for selecting the optimal threshold. Forman has presented a method, based on the distribution generated on training data and the distribution on unlabeled test data, for estimating the number of positives in target population. However, when the data size is small, it is difficult to reliably generate these distributions for estimating the number of positives. This paper presents a novel algorithm to generate these distributions based on the bootstrap and Fisher discriminant analysis. Experiment results on five UCI data sets demonstrate its effectiveness.