Applying Noise Handling Techniques to Genomic Data: A Case Study

  • Authors:
  • Choh Man Teng

  • Affiliations:
  • -

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Osteogenesis Imperfecta (OI) is a genetic collagenousdisease associated with mutations in one or both of thegenes COLIA1 and COLIA2. There are at least four knownphenotypes of OI, of which type II is the severest and oftenlethal. We identified three approaches to noise handling,namely, robust algorithms, filtering, and polishing,and evaluated their effectiveness when applied to the problemof classifying the disease OI based on a data set ofamino acid sequences and associated information of pointmutations of COLIA1. Preliminary results suggest that eachnoise handling mechanism can be useful under different circumstances.Filtering is stable across all cases. Pruningwith robust c4.5 increased the classification accuracy insome cases, and polishing gave rise to some additional improvementin classifying the lethal OI phenotype.