Integrating association rule mining with relational database systems: alternatives and implications

Authors:
Sunita Sarawagi;Shiby Thomas;Rakesh Agrawal
Affiliations:
IBM Almaden Research Center, 650 Harry Road, San Jose, CA;Dept. of Computer & Information Science & Engineering, University of Florida, Gainesville and IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM Almaden Research Center, 650 Harry Road, San Jose, CA
Venue:
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Year:
1998

Citing 15
Cited 128

Understanding the new SQL: a complete guide

Understanding the new SQL: a complete guide
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Object-oriented extensions in SQL3: a status report

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Using the new DB2: IBM's object-relational database system

Using the new DB2: IBM's object-relational database system
A database perspective on knowledge discovery

Communications of the ACM
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
Query flocks: a generalization of association-rule mining

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Set-Oriented Mining for Association Rules in Relational Databases

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A New SQL-like Operator for Mining Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Optimization of constrained frequent set queries with 2-variable constraints

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Exploratory mining via constrained frequent set queries

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The PanQ tool and EMF SQL for complex data management

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Extending complex ad-hoc OLAP

Proceedings of the eighth international conference on Information and knowledge management
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining and visualizing recommendation spaces for elliptic PDEs with continuous attributes

ACM Transactions on Mathematical Software (TOMS) - Special issue in honor of John Rice's 65th birthday
Fault-tolerant, load-balancing queries in telegraph

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The segment support map: scalable mining of frequent itemsets

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Systems support for scalable data mining

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
SQL database primitives for decision tree classifiers

Proceedings of the tenth international conference on Information and knowledge management
Efficient runtime generation of association rules

Proceedings of the tenth international conference on Information and knowledge management
Scalable frequent-pattern mining methods: an overview

Tutorial notes of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
SchemaSQL: An extension to SQL for multidatabase interoperability

ACM Transactions on Database Systems (TODS)
Constrained frequent pattern mining: a pattern-growth view

ACM SIGKDD Explorations Newsletter
Exploiting succinct constraints using FP-trees

ACM SIGKDD Explorations Newsletter
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
MSQL: A Query Language for Database Mining

Data Mining and Knowledge Discovery
Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications

Data Mining and Knowledge Discovery
Bottom-Up Association Rule Mining in Relational Databases

Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
Constraint-Based, Multidimensional Data Mining

Computer
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
DEMON: Mining and Monitoring Evolving Data

IEEE Transactions on Knowledge and Data Engineering
Algorithms and applications for universal quantification in relational databases

Information Systems - Special issue: Best papers from EDBT 2002
Web Log Mining and Parallel SQL Based Execution

DNIS '00 Proceedings of the International Workshop on Databases in Networked Information Systems
Incremental Mining of Constrained Associations

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
A Requirements Analysis for Parallel KDD Systems

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Using SQL to Build New Aggregates and Extenders for Object- Relational Systems

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The 3W Model and Algebra for Unified Data Mining

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Automatic Semantic Object Discovery and Mapping from Non-normalised Relational Database Systems

ADVIS '00 Proceedings of the First International Conference on Advances in Information Systems
Mining Generalized Association Rule Using Parallel RDB Engine on PC Cluster

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Performance Evaluation and Optimization of Join Queries for Association Rule Mining

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
SQL Based Association Rule Mining Using Commercial RDBMS (IBM DB2 UBD EEE)

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
On the Equivalence of Top-Down and Bottom-Up Data Mining in Relational Databases

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Decision Tree Modeling with Relational Views

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Integrating Data Mining with Relational DBMS: A Tightly-Coupled Approach

NGIT '99 Proceedings of the 4th International Workshop on Next Generation Information Technologies and Systems
Parallel SQL Based Association Rule Mining on Large Scale PC Cluster: Performance Comparison with Directly Coded C Implementation

PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
Making Knowledge Extraction and Reasoning Closer

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Efficient Rule Retrieval and Postponed Restrict Operations for Association Rule Mining

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
SETM*-MaxK: An Efficient SET-Based Approach to Find the Largest Itemset

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Data Mining of Generalized Association Rules Using a Method of Partial-Match Retrieval

DS '99 Proceedings of the Second International Conference on Discovery Science
Parallel and Distributed Data Mining: An Introduction

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
User-Defined Aggregates in Database Languages

DBPL '99 Revised Papers from the 7th International Workshop on Database Programming Languages: Research Issues in Structured and Semistructured Database Programming
Data mining tasks and methods: scalability

Handbook of data mining and knowledge discovery
A query-driven interesting rule discovery using associations and spanning operations

Data mining, rough sets and granular computing
Privacy conflicts in CRM services for online shops: a case study

CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Efficient Mining for Association Rules with Relational Database Systems

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Processing frequent itemset discovery queries by division and set containment join operators

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Mining and visualizing recommendation spaces for PDE solvers: the continuous attributes case

Computational science, mathematics and software
Forecasting Association Rules Using Existing Data Sets

IEEE Transactions on Knowledge and Data Engineering
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient dynamic mining of constrained frequent sets

ACM Transactions on Database Systems (TODS)
SQL based frequent pattern mining without candidate generation

Proceedings of the 2004 ACM symposium on Applied computing
Memory-adative association rules mining

Information Systems - Databases: Creation, management and utilization
Horizontal aggregations for building tabular data sets

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Efficient Disk-Based K-Means Clustering for Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Extracting predicates from mining models for efficient query evaluation

ACM Transactions on Database Systems (TODS)
Programming the K-means clustering algorithm in SQL

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs

Proceedings of the 6th annual ACM international workshop on Web information and data management
Index Support for Frequent Itemset Mining in a Relational DBMS

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Knowledge Discovery with Genetic Programming for Providing Feedback to Courseware Authors

User Modeling and User-Adapted Interaction
A native extension of SQL for mining data streams

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Depth-first frequent itemset mining in relational databases

Proceedings of the 2005 ACM symposium on Applied computing
CanTree: A Tree Structure for Efficient Incremental Mining of Frequent Patterns

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Integrating K-Means Clustering with a Relational DBMS Using SQL

IEEE Transactions on Knowledge and Data Engineering
An e-customer behavior model with online analytical mining for internet marketing planning

Decision Support Systems
MauveDB: supporting model-based user views in database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Vector and matrix operations programmed with UDFs in a relational DBMS

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Expressive power of an algebra for data mining

ACM Transactions on Database Systems (TODS)
Comprehensive data warehouse exploration with qualified association-rule mining

Decision Support Systems
Using grouping variables to express complex decision support queries

Data & Knowledge Engineering
CanTree: a canonical-order tree for incremental frequent-pattern mining

Knowledge and Information Systems
Mining association rules in very large clustered domains

Information Systems
Building statistical models and scoring with UDFs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Data Management in the Worldwide Sensor Web

IEEE Pervasive Computing
A new approach to mine frequent patterns using item-transformation methods

Information Systems
COMBI-operator - database support for data mining applications

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query languages and data models for database sequences and data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Computing frequent itemsets inside oracle 10G

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Toward supporting real-time mining for data residing on enterprise systems

Expert Systems with Applications: An International Journal
Cost-based query optimization for complex pattern mining on multiple databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
An XML-enabled data mining query language: XML-DMQL

International Journal of Business Intelligence and Data Mining
Efficient online mining of large databases

International Journal of Business Information Systems
Fast detection of database system abuse behaviors based on data mining approach

Proceedings of the 2nd international conference on Scalable information systems
Designing an inductive data stream management system: the stream mill experience

SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Index-BitTableFI: An improved algorithm for mining frequent itemsets

Knowledge-Based Systems
DataJewel: Integrating Visualization with Temporal Data Mining

Visual Data Mining
DB-FSG: An SQL-Based Approach for Frequent Subgraph Mining

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Architecture of a Database System

Foundations and Trends in Databases
Efficient OLAP with UDFs

Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
Identifying appropriate methodologies and strategies for vertical mining with incomplete data

WSEAS Transactions on Computers
Vertical mining with incomplete data

MAMECTIS'08 Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Implementing Multi-relational Mining with Relational Database Systems

KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part II
Data mining in deductive databases using query flocks

Expert Systems with Applications: An International Journal
XML data mining

Software—Practice & Experience
Splash: ad-hoc querying of data and statistical models

Proceedings of the 13th International Conference on Extending Database Technology
Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling

Data & Knowledge Engineering
New concepts for parallel object-relational query processing

New concepts for parallel object-relational query processing
Performance evaluation and analysis of K-way join variants for association rule mining

BNCOD'03 Proceedings of the 20th British national conference on Databases
Two-phase data warehouse optimized for data mining

BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises
DWMiner: a tool for mining frequent item sets efficiently in data warehouses

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Discovering frequent itemsets in the presence of highly frequent items

INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support
Processing sequential patterns in relational databases

Journal on data semantics VIII
Novel alarm correlation analysis system based on association rules mining in telecommunication networks

Information Sciences: an International Journal
Preprocessing expert system for mining association rules in telecommunication networks

Expert Systems with Applications: An International Journal
Multi-relational pattern mining system for general database systems

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
θ-Constrained multi-dimensional aggregation

Information Systems
A new logic correlation rule for HIV-1 protease mutation

Expert Systems with Applications: An International Journal
Relational languages and data models for continuous queries on sequences and data streams

ACM Transactions on Database Systems (TODS)
Explanation-based auditing

Proceedings of the VLDB Endowment
Mining interesting XML-enabled association rules with templates

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
A new approach to generate frequent patterns from enterprise databases

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Data mining using relational database management systems

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Enhanced DB-Subdue: supporting subtle aspects of graph mining using a relational approach

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Programming relational databases for Itemset mining over large transactional tables

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Processing sequential patterns in relational databases

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Mining least relational patterns from multi relational tables

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Bitmap index-based decision trees

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
A relational query primitive for constraint-based pattern mining

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Generic pattern mining via data mining template library

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
A framework for SQL-Based mining of large graphs on relational databases

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
SQL based frequent pattern mining with FP-Growth

INAP'04/WLP'04 Proceedings of the 15th international conference on Applications of Declarative Programming and Knowledge Management, and 18th international conference on Workshop on Logic Programming
Mining databases and data streams with query languages and rules

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Shaping SQL-Based frequent pattern mining algorithms

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Distributed methodology of cantree construction

MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
Mop: An Efficient Algorithm for Mining Frequent Pattern with Subtree Traversing

Fundamenta Informaticae
Semantic knowledge integration to support inductive query optimization

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Can we analyze big data inside a DBMS?

Proceedings of the sixteenth international workshop on Data warehousing and OLAP

Quantified Score

Hi-index	0.01

Visualization

Abstract

Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loose-coupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache-Mine. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and inter-operability.