Processing sequential patterns in relational databases

Authors:
Xuequn Shang;Kai-Uwe Sattler
Affiliations:
Department of Computer Science, University of Magdeburg, Magdeburg, Germany;Department of Computer Science and Automation, Technical University of Ilmenau
Venue:
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Year:
2005

Citing 7
Cited 1

Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Mining Various Patterns in Sequential Data in an SQL-like Manner

ADBIS '99 Proceedings of the Third East European Conference on Advances in Databases and Information Systems
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Depth-first frequent itemset mining in relational databases

Proceedings of the 2005 ACM symposium on Applied computing

Conceptual modeling for classification mining in data warehouses

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database integration of data mining has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. Recent studies have found that for association rule mining and sequential pattern mining with carefully tuned SQL formulations it is possible to achieve performance comparable to systems that cache the data in files outside the DBMS. However most of the previous pattern mining methods follow the method of Apriori which still encounters problems when a sequential database is large and/or when sequential patterns to be mined are numerous and long. In this paper, we present a novel SQL based approach that we recently proposed, called Prospad (PROjection Sequential PAttern Discovery). Prospad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach is a pattern growth-based approach without candidate generation. It grows longer patterns from shorter ones by successively projecting the sequential table into subsequential tables. Since a projected table for a sequential pattern i contains all and only necessary information for mining the sequential patterns that can grow from i, the size of the projected table usually reduces quickly as mining proceeds to longer patterns. Moreover, avoiding creating and dropping cost of some temporary tables, depth first approach is used to facilitate the projecting process.