Generic pattern mining via data mining template library

Authors:
Mohammed J. Zaki;Nilanjana De;Feng Gao;Paolo Palmerini;Nagender Parimi;Jeevan Pathuri;Benjarath Phoophakdee;Joe Urban
Affiliations:
Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY;Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY;Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY;Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY;Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY;Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY;Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY;Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY
Venue:
Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Year:
2004

Citing 20
Cited 1

Fast discovery of association rules

Advances in knowledge discovery and data mining
Query flocks: a generalization of association-rule mining

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Very Large Databases with Parallel Processing

Mining Very Large Databases with Parallel Processing
MSQL: A Query Language for Database Mining

Data Mining and Knowledge Discovery
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
A New SQL-like Operator for Mining Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Managing Heterogeneous Resources in Data Mining Applications on Grids Using XML-Based Metadata

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Scalable Classification over SQL Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Reusable components for partitioning clustering algorithms

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent Pattern Mining (FPM) is a very powerful paradigm for mining informative and useful patterns in massive, complex datasets. In this paper we propose the Data Mining Template Library, a collection of generic containers and algorithms for data mining, as well as persistency and database management classes. DMTL provides a systematic solution to a whole class of common FPM tasks like itemset, sequence, tree and graph mining. DMTL is extensible, scalable, and high-performance for rapid response on massive datasets. A detailed set of experiments show that DMTL is competitive with special purpose algorithms designed for a particular pattern type, especially as database sizes increase.