Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Lexical analysis and stoplists
Information retrieval
Efficient parallel data mining for association rules
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
A Parallel Distributive Join Algorithm for Cube-Connected Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Using latent semantic indexing for literature based discovery
Journal of the American Society for Information Science
Mining Text Using Keyword Distributions
Journal of Intelligent Information Systems
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A tree projection algorithm for generation of frequent item sets
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Hash based parallel algorithms for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Multipass algorithms for mining association rules in text databases
Knowledge and Information Systems
Classifying text documents by associating terms with text categories
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Parallel Algorithms for Discovery of Association Rules
Data Mining and Knowledge Discovery
Mining association rules using inverted hashing and pruning
Information Processing Letters
Data Mining: An Overview from a Database Perspective
IEEE Transactions on Knowledge and Data Engineering
Efficient Mining of Association Rules in Distributed Databases
IEEE Transactions on Knowledge and Data Engineering
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
Effect of Data Skewness and Workload Balance in Parallel Data Mining
IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
Fast Parallel Association Rule Mining without Candidacy Generation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Maximal Frequent Itemsets
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Mining Association Rules in Text Databases Using Multipass with Inverted Hashing and Pruning
ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Distributed mining of maximal frequent itemsets from databases on a cluster of workstations
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
The Journal of Supercomputing
Hi-index | 0.00 |
In this paper, we propose a new algorithm named Parallel Multipass with Inverted Hashing and Pruning (PMIHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., sets of words) that need to be counted. The new PMIHP algorithm is a parallel version of our Multipass with Inverted Hashing and Pruning (MIHP) algorithm (Holt, Chung in: Proc of the 14th IEEE int'l conf on tools with artificial intelligence, 2002, pp 49---56), which was shown to be quite efficient than other existing algorithms in the context of mining text databases. The PMIHP algorithm reduces the overhead of communication between miners running on different processors because they are mining local databases asynchronously and prune the global candidates by using the Inverted Hashing and Pruning technique. Compared with the well-known Count Distribution algorithm (Agrawal, Shafer in: (1996) IEEE Trans Knowl Data Eng 8(6):962---969), PMIHP demonstrates superior performance characteristics for mining association rules in large text databases, and when the minimum support level is low, its speedup is superlinear as the number of processors increases. These experiments were performed on a cluster of Linux workstations using a collection of Wall Street Journal articles.