The complexity of mining maximal frequent itemsets and maximal frequent patterns

Authors:
Guizhen Yang
Affiliations:
University at Buffalo, The State University of New York, Buffalo, NY
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 19
Cited 41

Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Complexity of Planar Counting Problems

SIAM Journal on Computing
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
The Complexity of Counting in Sparse, Regular, and Planar Graphs

SIAM Journal on Computing
SPADE: An Efficient Algorithm for Mining Frequent Sequences

Machine Learning
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
NP-Completeness: A Retrospective

ICALP '97 Proceedings of the 24th International Colloquium on Automata, Languages and Programming
On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Feasible itemset distributions in data mining: theory and application

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

A fuzzy data mining algorithm for incremental mining of quantitative sequential patterns

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Discovering Frequent Closed Partial Orders from Strings

IEEE Transactions on Knowledge and Data Engineering
Towards multidimensional subspace skyline analysis

ACM Transactions on Database Systems (TODS)
Boosting-based parse reranking with subtree features

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Static specification inference using predicate mining

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Mining, indexing, and searching for textual chemical molecule information on the web

Proceedings of the 17th international conference on World Wide Web
Discovering frequent sets from data streams with CPU constraint

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
A Linear Delay Algorithm for Building Concept Lattices

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
User Assisted Substructure Extraction in Molecular Data Mining

MDA '08 Proceedings of the 3rd international conference on Advances in Mass Data Analysis of Images and Signals in Medicine, Biotechnology, Chemistry and Food Industry
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

Data & Knowledge Engineering
Estimating the number of frequent itemsets in a large database

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A framework for mining top-k frequent closed itemsets using order preserving generators

Proceedings of the 2nd Bangalore Annual Compute Conference
A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns

IEICE - Transactions on Information and Systems
Models for association rules based on clustering and correlation

Intelligent Data Analysis
A Pattern Mining Approach Using QVT

ECMDA-FA '09 Proceedings of the 5th European Conference on Model Driven Architecture - Foundations and Applications
Towards efficient dominant relationship exploration of the product items on the web

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Time and space efficient discovery of maximal geometric graphs

DS'07 Proceedings of the 10th international conference on Discovery science
Efficient incremental mining of top-K frequent closed itemsets

DS'07 Proceedings of the 10th international conference on Discovery science
Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
MARGIN: Maximal frequent subgraph mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
Fun at a department store: data mining meets switching theory

FUN'10 Proceedings of the 5th international conference on Fun with algorithms
Detecting missing method calls in object-oriented software

ECOOP'10 Proceedings of the 24th European conference on Object-oriented programming
ρ-uncertainty: inference-proof transaction anonymization

Proceedings of the VLDB Endowment
Making interval-based clustering rank-aware

Proceedings of the 14th International Conference on Extending Database Technology
Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents

ACM Transactions on Information Systems (TOIS)
Adaptive load shedding for mining frequent patterns from data streams

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
The parameterized complexity of enumerating frequent itemsets

IWPEC'06 Proceedings of the Second international conference on Parameterized and Exact Computation
Transaction databases, frequent itemsets, and their condensed representations

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
A false negative maximal frequent itemset mining algorithm over stream

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Counterexample explanation by anomaly detection

SPIN'12 Proceedings of the 19th international conference on Model Checking Software
An adaptive algorithm for finding frequent sets in landmark windows

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

The VLDB Journal — The International Journal on Very Large Data Bases
Detecting missing method calls as violations of the majority rule

ACM Transactions on Software Engineering and Methodology (TOSEM)
The complexity of mining maximal frequent subgraphs

Proceedings of the 32nd symposium on Principles of database systems
Non-linear book manifolds: learning from associations the dynamic geometry of digital libraries

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
MFIBlocks: An effective blocking algorithm for entity resolution

Information Systems
Mining-based compression approach of propositional formulae

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Evaluation of RDF queries via equivalence

Frontiers of Computer Science: Selected Publications from Chinese Universities
Over-Fitting and Error Detection for Online Role Mining

International Journal of Web Services Research
Mining frequent itemsets in data streams within a time horizon

Data & Knowledge Engineering
A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining maximal frequent itemsets is one of the most fundamental problems in data mining. In this paper we study the complexity-theoretic aspects of maximal frequent itemset mining, from the perspective of counting the number of solutions. We present the first formal proof that the problem of counting the number of distinct maximal frequent itemsets in a database of transactions, given an arbitrary support threshold, is #P-complete, thereby providing strong theoretical evidence that the problem of mining maximal frequent itemsets is NP-hard. This result is of particular interest since the associated decision problem of checking the existence of a maximal frequent itemset is in P.We also extend our complexity analysis to other similar data mining problems dealing with complex data structures, such as sequences, trees, and graphs, which have attracted intensive research interests in recent years. Normally, in these problems a partial order among frequent patterns can be defined in such a way as to preserve the downward closure property, with maximal frequent patterns being those without any successor with respect to this partial order. We investigate several variants of these mining problems in which the patterns of interest are subsequences, subtrees, or subgraphs, and show that the associated problems of counting the number of maximal frequent patterns are all either #P-complete or #P-hard.