Integrating association rule mining with relational database systems: alternatives and implications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
Database Mining: A Performance Perspective
IEEE Transactions on Knowledge and Data Engineering
A Tightly-Coupled Architecture for Data Mining
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A New SQL-like Operator for Mining Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Comparison between Query Languages for the Extraction of Association Rules
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Integrating Data Mining with Relational DBMS: A Tightly-Coupled Approach
NGIT '99 Proceedings of the 4th International Workshop on Next Generation Information Technologies and Systems
Efficient Indexing Structures for Mining Frequent Patterns
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing, storing and querying frequent patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Shaping SQL-Based frequent pattern mining algorithms
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Hi-index | 0.00 |
Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.