Efficient calculation of compound similarity based on maximum common subgraphs and its application to prediction of gene transcript levels

Authors:
Rogier J. P. Van Berlo;Wynand Winterbach;Marco J. L. De Groot;Andreas Bender;Peter J. T. Verheijen;Marcel J. T. Reinders;Dick De Ridder
Affiliations:
The Delft Bioinformatics Lab/Kluyver Centre for Genomics of Industrial Fermentation, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4 ...;The Network Architecture and Services Group/The Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD D ...;DSM Biotechnology Center, DSM Food Specialties BV, Alexander Fleminglaan 1, 2613 AX Delft, The Netherlands;Unilever Centre for Molecular Science Informatics Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK;Department of Biotechnology/Kluyver Centre for Genomics of Industrial Fermentation, Faculty of Applied Sciences, Delft University of Technology, Julianalaan 67, 2628 BC Delft, The Netherlands;The Delft Bioinformatics Lab/Kluyver Centre for Genomics of Industrial Fermentation, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4 ...;The Delft Bioinformatics Lab/Kluyver Centre for Genomics of Industrial Fermentation, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4 ...
Venue:
International Journal of Bioinformatics Research and Applications
Year:
2013

Citing 8
Cited 0

Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
MoSS: a program for molecular substructure mining

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Determining Molecular Similarity for Drug Discovery using the Wavelet Riemannian Metric

BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
An Efficient Branch-and-bound Algorithm for Finding a Maximum Clique with Computational Experiments

Journal of Global Optimization
A maximum common substructure-based algorithm for searching and predicting drug-like compounds

Bioinformatics
ChemmineR

Bioinformatics
Metabolite and reaction inference based on enzyme specificities

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Properties of a chemical entity, both physical and biological, are related to its structure. Since compound similarity can be used to infer properties of novel compounds, in chemoinformatics much attention has been paid to ways of calculating structural similarity. A useful metric to capture the structural similarity between compounds is the relative size of the Maximum Common Subgraph MCS. The MCS is the largest substructure present in a pair of compounds, when represented as graphs. However, in practice it is difficult to employ such a metric, since calculation of the MCS becomes computationally intractable when it is large. We propose a novel algorithm that significantly reduces computation time for finding large MCSs, compared to a number of state-of-the-art approaches. The use of this algorithm is demonstrated in an application predicting the transcriptional response of breast cancer cell lines to different drug-like compounds, at a scale which is challenging for the most efficient MCS-algorithms to date. In this application 714 compounds were compared.