S2MP: similarity measure for sequential patterns

Authors:
Hassan Saneifar;Sandra Bringay;Anne Laurent;Maguelonne Teisseire
Affiliations:
University of montpellier, France;University of montpellier, France;University of montpellier, France;University of montpellier, France
Venue:
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Year:
2008

Citing 12
Cited 4

Matching and indexing sequences of different lengths

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
A Scalable Algorithm for Clustering Sequential Data

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Pattern-Oriented Hierachical Clustering

ADBIS '99 Proceedings of the Third East European Conference on Advances in Databases and Information Systems
Mining Frequent Sequential Patterns under a Similarity Constraint

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
ADMIT: anomaly-based data mining for intrusions

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Similarity of event sequences

TIME '97 Proceedings of the 4th International Workshop on Temporal Representation and Reasoning (TIME '97)
An Efficient Algorithm to Compute Differences between Structured Documents

IEEE Transactions on Knowledge and Data Engineering
Approximate mining of consensus sequential patterns

Approximate mining of consensus sequential patterns
Mining unexpected multidimensional rules

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP

Partial Symbol Ordering Distance

MDAI '09 Proceedings of the 6th International Conference on Modeling Decisions for Artificial Intelligence
Discovering novelty in gene data: from sequential patterns to visualization

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part III
Effective next-items recommendation via personalized sequential pattern mining

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Sequential patterns mining and gene sequence visualization to discover novelty from microarray data

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data mining, computing the similarity of objects is an essential task, for example to identify regularities or to build homogeneous clusters of objects. In the case of sequential data seen in various fields of application (e.g. series of customers purchases, Internet navigation) this problem (i.e. comparing the similarity of sequences) is very important. There are already some similarity measures as Edit distance and LCS suited to simple sequences, but these measures are not relevant in the case of complex sequences composed of sets of items, as is the case of sequential patterns. In this paper, we propose a new similarity measure taking the characteristics of sequential patterns into account. S2 M P is an adjustable measure depending on the importance given to each characteristic of sequential patterns according to context, which is not the case of existing measures. We have experimented the accuracy and quality of S2 M P against Edit distance by using them in a clustering of sequential patterns. The results show that the clusters obtained by S2 M P are more homogeneous. Moreover these cluster are more precise and more complete according to the clusters obtained using Edit distance. The experiments show also that S2 M P is efficient in term of calculation time and size of used memory.