C4.5: programs for machine learning
C4.5: programs for machine learning
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Molecular feature mining in HIV data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Cyclic pattern kernels for predictive graph mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A quickstart in frequent structure mining can make a difference
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining in bioinformatics using Weka
Bioinformatics
Entire regularization paths for graph data
Proceedings of the 24th international conference on Machine learning
Finding low-entropy sets and trees from binary data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Partial least squares regression for graph mining
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing Feature Sets for Structured Data
ECML '07 Proceedings of the 18th European conference on Machine Learning
An Efficiently Computable Graph-Based Metric for the Classification of Small Molecules
DS '08 Proceedings of the 11th International Conference on Discovery Science
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining
Fundamenta Informaticae - Progress on Multi-Relational Data Mining
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Large-scale graph mining using backbone refinement classes
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Capacity Control for Partially Ordered Feature Sets
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Output space sampling for graph patterns
Proceedings of the VLDB Endowment
Guest Editorial: Global modeling using local patterns
Data Mining and Knowledge Discovery
Fast, effective molecular feature mining by local optimization
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Inductive databases and constraint-based data mining
ICFCA'11 Proceedings of the 9th international conference on Formal concept analysis
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining
Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Interactive pattern mining on hidden data: a sampling-based solution
Proceedings of the 21st ACM international conference on Information and knowledge management
Annals of Mathematics and Artificial Intelligence
Hi-index | 0.00 |
This paper investigates the trade-off between the expressiveness of the pattern language and the performance of the pattern miner in structured data mining. This trade-off is investigated in the context of correlated pattern mining, which is concerned with finding the k-best patterns according to a convex criterion, for the pattern languages of itemsets, multi-itemsets, sequences, trees and graphs. The criteria used in our investigation are the typical ones in data mining: computational cost and predictive accuracy and the domain is that of mining molecular graph databases. More specifically, we provide empirical answers to the following questions: how does the expressive power of the language affect the computational cost? and what is the trade-off between expressiveness of the pattern language and the predictive accuracy of the learned model? While answering the first question, we also introduce a novel stepwise approach to correlated pattern mining in which the results of mining a simpler pattern language are employed as a starting point for mining in a more complex one. This stepwise approach typically leads to significant speed-ups (up to a factor 1000) for mining graphs.