Information Retrieval
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
Metagenomics is to study microorganisms by directly extracting and cloning their DNAs from the environment without lab cultivation or isolation of individual genomes. Assembling of metagenomic DNA fragments is very much like the overlap-layout-consensus procedure for assembling isolated genomes, but is augmented by an additional binning step to differentiate scaffolds, contigs and unassembled reads into various taxonomic groups. In this paper, we employed oligonucleotide frequencies as the features and developed a hierarchical scheme for the challenging task of binning short metagenome fragments, in which principal component analysis (PCA) was implemented to reduce the high dimensionality of the feature space, and linear discriminant analysis (LDA) was used for the local classifier design. Simulation results and comparisons with a non-hierarchical classifier in silico were presented to demonstrate the effectiveness and performance of the proposed PCA and LDA-based hierarchical scheme. The HIER package for this study is available upon request.