Don't be afraid of simpler patterns

Authors:
Björn Bringmann;Albrecht Zimmermann;Luc De Raedt;Siegfried Nijssen
Affiliations:
Institute of Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Freiburg, Germany;Institute of Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Freiburg, Germany;Institute of Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Freiburg, Germany;Institute of Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Freiburg, Germany
Venue:
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Year:
2006

Citing 9
Cited 17

C4.5: programs for machine learning

C4.5: programs for machine learning
Transversing itemset lattices with statistical metric pruning

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Cyclic pattern kernels for predictive graph mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining in bioinformatics using Weka

Bioinformatics

Entire regularization paths for graph data

Proceedings of the 24th international conference on Machine learning
Finding low-entropy sets and trees from binary data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Partial least squares regression for graph mining

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing Feature Sets for Structured Data

ECML '07 Proceedings of the 18th European conference on Machine Learning
An Efficiently Computable Graph-Based Metric for the Classification of Small Molecules

DS '08 Proceedings of the 11th International Conference on Discovery Science
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
gBoost: a mathematical programming approach to graph classification and regression

Machine Learning
Aggregated Subset Mining

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Large-scale graph mining using backbone refinement classes

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Capacity Control for Partially Ordered Feature Sets

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Guest Editorial: Global modeling using local patterns

Data Mining and Knowledge Discovery
Fast, effective molecular feature mining by local optimization

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Inductive databases and constraint-based data mining

ICFCA'11 Proceedings of the 9th international conference on Formal concept analysis
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Interactive pattern mining on hidden data: a sampling-based solution

Proceedings of the 21st ACM international conference on Information and knowledge management
A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the trade-off between the expressiveness of the pattern language and the performance of the pattern miner in structured data mining. This trade-off is investigated in the context of correlated pattern mining, which is concerned with finding the k-best patterns according to a convex criterion, for the pattern languages of itemsets, multi-itemsets, sequences, trees and graphs. The criteria used in our investigation are the typical ones in data mining: computational cost and predictive accuracy and the domain is that of mining molecular graph databases. More specifically, we provide empirical answers to the following questions: how does the expressive power of the language affect the computational cost? and what is the trade-off between expressiveness of the pattern language and the predictive accuracy of the learned model? While answering the first question, we also introduce a novel stepwise approach to correlated pattern mining in which the results of mining a simpler pattern language are employed as a starting point for mining in a more complex one. This stepwise approach typically leads to significant speed-ups (up to a factor 1000) for mining graphs.