Instance-Based Learning Algorithms
Machine Learning
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Machine Learning
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
Mining Molecular Fragments: Finding Relevant Substructures of Molecules
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
SPIN: mining maximal frequent subgraphs from graph databases
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A quickstart in frequent structure mining can make a difference
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Graph Data
Frequent Substructure-Based Approaches for Classifying Chemical Compounds
IEEE Transactions on Knowledge and Data Engineering
Mining closed relational graphs with connectivity constraints
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
2005 Speical Issue: Graph kernels for chemical informatics
Neural Networks - Special issue on neural networks and kernel methods for structured domains
Subdue: compression-based frequent pattern discovery in graph data
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Finding reliable subgraphs from large probabilistic graphs
Data Mining and Knowledge Discovery
Partial least squares regression for graph mining
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Correlated itemset mining in ROC space: a constraint programming approach
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph Propositionalization for Random Forests
ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
GAIA: graph classification using evolutionary computation
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The Journal of Machine Learning Research
Hi-index | 0.00 |
Quantitative structure-activity relationship QSAR models have gained popularity in the pharmaceutical industry due to their potential to substantially decrease drug development costs by reducing expensive laboratory and clinical tests. QSAR modeling consists of two fundamental steps, namely, descriptor discovery and model building. Descriptor discovery methods are either based on chemical domain knowledge or purely data-driven. The former, chemoinformatics-based, and the latter, substructures-based, methods for QSAR modeling, have been developed quite independently. As a consequence, evaluations involving both types of descriptor discovery method are rarely seen. In this study, a comparative analysis of chemoinformatics-based and substructure-based approaches is presented. Two chemoinformatics-based approaches; ECFI and SELMA, are compared to five approaches for substructure discovery; CP, graphSig, MFI, MoFa and SUBDUE, using 18 QSAR datasets. The empirical investigation shows that one of the chemo-informatics-based approaches, ECFI, results in significantly more accurate models compared to all other methods, when used on their own. Results from combining descriptor sets are also presented, showing that the addition of ECFI descriptors to any other descriptor set leads to improved predictive performance for that set, while the use of ECFI descriptors in many cases also can be improved by adding descriptors generated by the other methods.