Information retrieval
Information retrieval
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Rapid bushy join-order optimization with Cartesian products
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
Using a knowledge cache for interactive discovery of association rules
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The implementation and performance of compressed databases
ACM SIGMOD Record
KDD-Cup 2000 organizers' report: peeling the onion
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Online Generation of Association Rules
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Discovering Frequent Closed Itemsets for Association Rules
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Querying multiple sets of discovered rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A performance study of four index structures for set-valued attributes of low cardinality
The VLDB Journal — The International Journal on Very Large Data Bases
On computing, storing and querying frequent patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining compressed frequent-pattern sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Rule interestingness analysis using OLAP operations
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Opportunity map: identifying causes of failure - a deployed data mining system
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
IEEE Transactions on Knowledge and Data Engineering
Towards exploratory hypothesis testing and analysis
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Hi-index | 0.00 |
Frequent itemset mining is an important problem in the data mining area. Extensive efforts have been devoted to developing efficient algorithms for mining frequent itemsets. However, not much attention is paid on managing the large collection of frequent itemsets produced by these algorithms for subsequent analysis and for user exploration. In this paper, we study three structures for indexing and querying frequent itemsets: inverted files, signature files and CFP-tree. The first two structures have been widely used for indexing general set-valued data. We make some modifications to make them more suitable for indexing frequent itemsets. The CFP-tree structure is specially designed for storing frequent itemsets. We add a pruning technique based on length-2 frequent itemsets to make it more efficient for processing superset queries. We study the performance of the three structures in supporting five types of containment queries: exact match, subset/superset search and immediate subset/superset search. Our results show that no structure can outperform other structures for all the five types of queries on all the datasets. CFP-tree shows better overall performance than the other two structures.