Computational aspects of mining maximal frequent patterns

Authors:
Guizhen Yang
Affiliations:
Artificial Intelligence Center, SRI International, Menlo Park, CA and Department of Computer Science and Engineering, University at Buffalo, The State University of New York
Venue:
Theoretical Computer Science
Year:
2006

Citing 35
Cited 7

Ordered and Unordered Tree Inclusion

SIAM Journal on Computing
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Complexity of Planar Counting Problems

SIAM Journal on Computing
Incremental and interactive sequence mining

Proceedings of the eighth international conference on Information and knowledge management
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Introduction to Algorithms

Introduction to Algorithms
Mining sequential patterns with constraints in large databases

Proceedings of the eleventh international conference on Information and knowledge management
The Complexity of Counting in Sparse, Regular, and Planar Graphs

SIAM Journal on Computing
Mining Sequential Patterns with Regular Expression Constraints

IEEE Transactions on Knowledge and Data Engineering
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
NP-Completeness: A Retrospective

ICALP '97 Proceedings of the 24th International Colloquium on Automata, Languages and Programming
On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Feasible itemset distributions in data mining: theory and application

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining knowledge-sharing sites for viral marketing

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning nonstationary models of normal network traffic for detecting novel attacks

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ADMIT: anomaly-based data mining for intrusions

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
SLPMiner: An Algorithm for Finding Frequent Sequential Patterns Using Length-Decreasing Support Constraint

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Plans for Customer-Class Transformation

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
MPIS: Maximal-Profit Item Selection with Cross-Selling Considerations

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
To buy or not to buy: mining airfare data to minimize ticket purchase price

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining phenotypes and informative genes from gene expression data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

A cost-driven approach to role engineering

Proceedings of the 2008 ACM symposium on Applied computing
Direct mining of discriminative and essential frequent patterns via model-based search tree

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering hybrid temporal patterns from sequences consisting of point- and interval-based events

Data & Knowledge Engineering
On the Complexity of Constraint-Based Theory Extraction

DS '09 Proceedings of the 12th International Conference on Discovery Science
Tree pattern mining with tree automata constraints

Information Systems
Towards proximity pattern mining in large graphs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Counterexample explanation by anomaly detection

SPIN'12 Proceedings of the 19th international conference on Model Checking Software

Quantified Score

Hi-index	5.23

Visualization

Abstract

In this paper we study the complexity-theoretic aspects of mining maximal frequent patterns, from the perspective of counting the number of all distinct solutions. We present the first formal proof that the problem of counting the number of maximal frequent itemsets in a database of transactions, given an arbitrary support threshold, is #P-complete, thereby providing theoretical evidence that the problem of mining maximal frequent itemsets is NP-hard. We also extend our complexity analysis to other similar data mining problems that deal with complex data structures, such as sequences, trees, and graphs. We investigate several variants of these mining problems in which the patterns of interest are subsequences, subtrees, or subgraphs, and show that the associated problems of counting the number of maximal frequent patterns are all either #P-complete or #P-hard.