Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree
Data Mining and Knowledge Discovery
PLWAP sequential mining: open source code
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Position coded pre-order linked WAP-tree for web log sequential pattern mining
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
A taxonomy of sequential pattern mining algorithms
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Position Coded Pre-order Linked Web Access Pattern (PLWAP) mining algorithm is one of the existing efficient web sequential pattern mining algorithms, which stores the frequent sequences of the entire sequential database in a compressed tree form with position coded nodes. However, for very long sequences exceeding thirty two nodes, the number of bits an integer position code can hold, the PLWAP algorithm's performance begins to degrade because it employs linked lists to store conjunctions of long position codes and the linked list traversals slow down the algorithm both during tree construction and mining. PLWAP algorithm also uses each and every node in the frequent 1-item event queue to test for that event inclusion in the suffix tree root set during mining. This paper proposes (1) using a different position code numbering scheme where each node is assigned two numeric codes (startPosition, endPosition) instead of one, (2) using pre-knowledge of "Last Descendant" of each tree branch to lower the cost of creating the suffix tree root sets during mining. Experiments show that the proposed new scheme, the PLWAPLong outperforms the PLWAP for long sequences and large databases as well as regular databases.