An integrated, generic approach to pattern mining: data mining template library

Authors:
Vineet Chaoji;Mohammad Al Hasan;Saeed Salem;Mohammed J. Zaki
Affiliations:
Computer Science Department, Rensselaer Polytechnic Institute, Troy, USA 12180;Computer Science Department, Rensselaer Polytechnic Institute, Troy, USA 12180;Computer Science Department, Rensselaer Polytechnic Institute, Troy, USA 12180;Computer Science Department, Rensselaer Polytechnic Institute, Troy, USA 12180
Venue:
Data Mining and Knowledge Discovery
Year:
2008

Citing 51
Cited 4

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
CLIP: concept learning from inference patterns

Artificial Intelligence - Special issue: AI research in Japan
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
STL tutorial and reference guide, second edition: C++ programming with the standard template library

STL tutorial and reference guide, second edition: C++ programming with the standard template library
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Indexing and Mining Free Trees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Unordered Tree Mining with Applications to Phylogeny

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
BIDE: Efficient Mining of Frequent Closed Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Advances in frequent itemset mining implementations: report on FIMI'03

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure

IEEE Transactions on Knowledge and Data Engineering
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Out-of-core frequent pattern mining on a commodity PC

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent subgraph mining in outerplanar graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently Mining Frequent Embedded Unordered Trees

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
PSTL: a C++ persistent standard template library

COOTS'01 Proceedings of the 6th conference on USENIX Conference on Object-Oriented Technologies and Systems - Volume 6
Data mining using relational database management systems

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
On horn axiomatizations for sequential data

ICDT'05 Proceedings of the 10th international conference on Database Theory
Towards generic pattern mining

ICFCA'05 Proceedings of the Third international conference on Formal Concept Analysis

A lower bound on the sample size needed to perform a significant frequent pattern mining task

Pattern Recognition Letters
Depth first algorithms and inferencing for AFD mining

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Para Miner: a generic pattern mining algorithm for multi-core architectures

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software ( http://dmtl.sourceforge.net/ ), and has been downloaded by numerous researchers from all over the world.