Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Real world performance of association rule algorithms
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Document Categorization by Term Association
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Short documents clustering in very large text databases
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Hi-index | 0.00 |
Frequent itemsets mining is a common and useful task in data mining. But most of the current mining algorithms can’t be used in very large text database. In this paper, we propose a novel and efficient parallel algorithm parTFI which is used to find top-k frequent itemsets with specified minimum length in very large text database. Base on a simple data structure H-struct, parTFI uses a novel logical vertical data partition technique to mine top-k frequent itemsets at each mining server parallel. Our performance study shows that when processing very large sparse text database, parTFI outperforms Apriori and FP-growth, two efficient frequent iemsets mining algorithms, even when both are running with the better tuned min_support. Furthermore, by creating H-struct dynamically, parTFI can suit even huge dataset that most other algorithms can’t process.