From user access patterns to dynamic hypertext linking
Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
GroupLens: applying collaborative filtering to Usenet news
Communications of the ACM
Machine Learning
Using Text Categorization Techniques for Intrusion Detection
Proceedings of the 11th USENIX Security Symposium
Knowledge discovery from users Web-page navigation
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
A Survey of Longest Common Subsequence Algorithms
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Maximum-entropy estimated distribution model for classification problems
International Journal of Hybrid Intelligent Systems
Rough clustering of sequential data
Data & Knowledge Engineering
Using sub-sequence information with kNN for classification of sequential data
ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
User Behaviour Pattern Mining from Weblog
International Journal of Data Warehousing and Mining
Hi-index | 0.00 |
In many data mining applications, both classification and clustering algorithms require a distance/similarity measure. The central problem in similarity based clustering/classification comprising sequential data is deciding an appropriate similarity metric. The existing metrics like Euclidean, Jaccard, Cosine, and so forth do not exploit the sequential nature of data explicitly. In this paper, the authors propose a similarity preserving function called Sequence and Set Similarity Measure S3M that captures both the order of occurrence of items in sequences and the constituent items of sequences. The authors demonstrate the usefulness of the proposed measure for classification and clustering tasks. Experiments were conducted on benchmark datasets, that is, DARPA'98 and msnbc, for classification task in intrusion detection and clustering task in web mining domains. Results show the usefulness of the proposed measure.