Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

  • Authors:
  • Mukund Deshpande;Michihiro Kuramochi;George Karypis

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we study the problem of classifying chemical compounddatasets. We present a sub-structure-based classificationalgorithm that decouples the sub-structure discovery processfrom the classification model construction and uses frequentsubgraph discovery algorithms to find all topological and geometricsub-structures present in the dataset. The advantage ofour approach is that during classification model construction, allrelevant sub-structures are available allowing the classifier tointelligently select the most discriminating ones. The computationalscalability is ensured by the use of highly efficient frequentsubgraph discovery algorithms coupled with aggressive featureselection. Our experimental evaluation on eight different classificationproblems shows that our approach is computationallyscalable and on the average, outperforms existing schemes by10% to 35%.