DNA Sequence Classification Using Compression-Based Induction

  • Authors:
  • D. Lowenstern;H. Hirsh;M. Noordiwier;P. Yianilos

  • Affiliations:
  • -;-;-;-

  • Venue:
  • DNA Sequence Classification Using Compression-Based Induction
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Inductive learning methods, such as neural networks and decision trees, have become a popular approach to developing DNA sequence identification tools. Such methods attempt to form models of a collection of training data that can be used to predict future data accurately. The common approach to using such methods on DNA sequence identification problems forms models that depend on the {\em absolute locations} of nucleotides and assume {\em independence} of consecutive nucleotide locations. This paper describes a new class of learning methods, called {\em compression-based induction} (CBI), that is geared towards sequence learning problems such as those that arise when learning DNA sequences. The central idea is to use text compression techniques on DNA sequences as the means for generalizing