Prism: An effective approach for frequent sequence mining via prime-block encoding

Authors:
Karam Gouda;Mosab Hassaan;Mohammed J. Zaki
Affiliations:
Mathematics Dept., Faculty of Science, Benha, Egypt;Mathematics Dept., Faculty of Science, Benha, Egypt;Computer Science Dept., Rensselaer Polytechnic Institute, Troy, NY, USA
Venue:
Journal of Computer and System Sciences
Year:
2010

Citing 12
Cited 3

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
BIDE: Efficient Mining of Frequent Closed Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A generic motif discovery algorithm for sequential data

Bioinformatics
Prism: A Primal-Encoding Approach for Frequent Sequence Mining

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
On horn axiomatizations for sequential data

ICDT'05 Proceedings of the 10th international conference on Database Theory

MSGPs: a novel algorithm for mining sequential generator patterns

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part II
An effective algorithm for mining closed sequential patterns and their minimal generators based on prefix trees

International Journal of Intelligent Information and Database Systems
Frequent patterns mining in multiple biological sequences

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach for mining frequent sequences, called Prism. It utilizes a vertical approach for enumeration and support counting, based on the novel notion of primal block encoding, which in turn is based on prime factorization theory. Via an extensive evaluation on both synthetic and real datasets, we show that Prism outperforms popular sequence mining methods like SPADE [M.J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Mach. Learn. J. 42 (1/2) (Jan/Feb 2001) 31-60], PrefixSpan [J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M.-C. Hsu, PrefixSpan: Mining sequential patterns efficiently by prefixprojected pattern growth, in: Int'l Conf. Data Engineering, April 2001] and SPAM [J. Ayres, J.E. Gehrke, T. Yiu, J. Flannick, Sequential pattern mining using bitmaps, in: SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002], by an order of magnitude or more.