Finding optimal parameters for edit distance based sequence classification is NP-hard

  • Authors:
  • Vlado Kešelj;Haibin Liu;Norbert Zeh;Christian Blouin;Chris Whidden

  • Affiliations:
  • Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada

  • Venue:
  • Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parametric edit distance based classification has been applied to two significant problems in the bioinformatics area: biological sequence analysis (DNA, RNA, protein), and semantic relationship extraction from biomedical scientific literature. This method is based on the edit distance measure on sequences, with parametric costs for matching, mismatching, inserts, and deletes of letters. We present a proof that finding optimal parameter values for such classification based on training data is an NP-hard problem, which is an important claim to justify the use of heuristic methods for determining the best parameter values.