Improving classification for microarray data sets by constructing synthetic data

Authors:
Shun Bian;Wenjia Wang
Affiliations:
School of Computing Sciences, University of East Anglia, Norwich, UK;School of Computing Sciences, University of East Anglia, Norwich, UK
Venue:
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Year:
2005

Citing 7
Cited 0

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms

International Journal of Man-Machine Studies - Special issue: symbolic problem solving in noisy and novel task environments
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Information, Prediction, and Query by Committee

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Comparing Natural and Synthetic Training Data for Off-Line Cursive Handwriting Recognition

IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
Constructing diverse classifier ensembles using artificial training examples

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microarray technology has been widely used in biological and medical research to observe a large number of gene expressions. However, such experiments are usually carried out with few replica or instances, which may lead to poor modelling and analysis. This paper suggests an approach to improve classification by using synthetic data. A new algorithm is proposed to estimate synthetic data value and the generated data are labelled by ensemble methods. Experiments with artificial data and real world data demonstrate that the proposed algorithm is able to generate synthetic data on uncertain regions of classifiers to improve effectiveness and efficiency of classification on microarray data sets.