Efficient mining of association rules in text databases

Authors:
John D. Holt;Soon M. Chung
Affiliations:
Department of Computer Science and Engineering, Wright State University, Dayton, Ohio;Department of Computer Science and Engineering, Wright State University, Dayton, Ohio
Venue:
Proceedings of the eighth international conference on Information and knowledge management
Year:
1999

Citing 9
Cited 7

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Using latent semantic indexing for literature based discovery

Journal of the American Society for Information Science
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules

News Sensitive Stock Trend Prediction

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
TopCat: Data Mining for Topic Identification in a Text Corpus

IEEE Transactions on Knowledge and Data Engineering
Semantic-Based Temporal Text-Rule Mining

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering

International Journal of Approximate Reasoning
The improvement of PHP algorithm for association rules

CAR'10 Proceedings of the 2nd international Asia conference on Informatics in control, automation and robotics - Volume 3
Mining association rules in temporal document collections

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
A novel approach of mining write-prints for authorship attribution in e-mail forensics

Digital Investigation: The International Journal of Digital Forensics & Incident Response

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose two new algorithms for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, Apriori algorithm and Direct Hashing and Pruning (DHP) algorithm, are evaluated in the context of mining text databases, and are compared with the new proposed algorithms named Multipass-Apriori (M-Apriori) and Multipass-DHP (M-DHP). It has been shown that the proposed algorithms have better performance for large text databases.