A New Similarity Metric for Sequential Data

Authors:
Pradeep Kumar;Bapi S. Raju;P. Radha Krishna
Affiliations:
Indian Institute of Management, India;University of Hyderabad, India;Infosys Technologies Limited, Hyderabad, India
Venue:
International Journal of Data Warehousing and Mining
Year:
2010

Citing 9
Cited 1

From user access patterns to dynamic hypertext linking

Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
GroupLens: applying collaborative filtering to Usenet news

Communications of the ACM
Machine Learning

Machine Learning
Using Text Categorization Techniques for Intrusion Detection

Proceedings of the 11th USENIX Security Symposium
Knowledge discovery from users Web-page navigation

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
A Survey of Longest Common Subsequence Algorithms

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Maximum-entropy estimated distribution model for classification problems

International Journal of Hybrid Intelligent Systems
Rough clustering of sequential data

Data & Knowledge Engineering
Using sub-sequence information with kNN for classification of sequential data

ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology

User Behaviour Pattern Mining from Weblog

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many data mining applications, both classification and clustering algorithms require a distance/similarity measure. The central problem in similarity based clustering/classification comprising sequential data is deciding an appropriate similarity metric. The existing metrics like Euclidean, Jaccard, Cosine, and so forth do not exploit the sequential nature of data explicitly. In this paper, the authors propose a similarity preserving function called Sequence and Set Similarity Measure S3M that captures both the order of occurrence of items in sequences and the constituent items of sequences. The authors demonstrate the usefulness of the proposed measure for classification and clustering tasks. Experiments were conducted on benchmark datasets, that is, DARPA'98 and msnbc, for classification task in intrusion detection and clustering task in web mining domains. Results show the usefulness of the proposed measure.