Similarity search for multi-dimensional NMR-Spectra of natural products

  • Authors:
  • Karina Wolfram;Andrea Porzel;Alexander Hinneburg

  • Affiliations:
  • Institute of Computer Science, Martin-Luther-University of Halle-Wittenberg, Germany;Leibniz Institute of Plant Biochemistry (IPB), Germany;Institute of Computer Science, Martin-Luther-University of Halle-Wittenberg, Germany

  • Venue:
  • PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Searching and mining nuclear magnetic resonance (NMR)-spectra of naturally occurring products is an important task to investigate new potentially useful chemical compounds. We develop a set-based similarity function, which, however, does not sufficiently capture more abstract aspects of similarity. NMR-spectra are like documents, but consists of continuous multi-dimensional points instead of words. Probabilistic semantic indexing (PLSI) is an retrieval method, which learns hidden topics. We develop several mappings from continuous NMR-spectra to discrete text-like data. The new mappings include redundancies into the discrete data, which proofs helpful for the PLSI-model used afterwards. Our experiments show that PLSI, which is designed for text data created by humans, can effectively handle the mapped NMR-data originating from natural products. Additionally, PLSI combined with the new mappings is able to find meaningful ”topics” in the NMR-data.