Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Global partial orders from sequential data
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Mining All Non-derivable Frequent Itemsets
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
CloseGraph: mining closed frequent graph patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach
IEEE Transactions on Knowledge and Data Engineering
Mining Graph Data
Discovering Frequent Closed Partial Orders from Strings
IEEE Transactions on Knowledge and Data Engineering
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
Searching for Gapped Palindromes
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Power-Law Distributions in Empirical Data
SIAM Review
Tell me what i need to know: succinctly summarizing data with itemsets
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Ranking Web-Based Partial Orders by Significance Using a Markov Reference Model
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Towards generic pattern mining
ICFCA'05 Proceedings of the Third international conference on Formal Concept Analysis
Detection of HTTP-GET attack with clustering and information theoretic measurements
FPS'12 Proceedings of the 5th international conference on Foundations and Practice of Security
Hi-index | 0.00 |
In this paper we discuss an interesting and useful property of clickstream data. Often a visit includes repeated views of the same page. We show that in three real datasets, sampled from the websites of technology and consulting groups and a news broadcaster, page repetitions occur for the majority as a very specific structure, namely in the form of nested palindromes. This can be explained by the widespread use of features which are available in any web browser: the "refresh" and "back" buttons. Among the types of patterns which can be mined from sequence data, many either stumble if symbol repetitions are involved, or else fail to capture interesting aspects related to symbol repetitions. In an attempt to remedy this, we characterize the palindromic structures, and discuss possible ways of making use of them. One way is to pre-process the sequence data by explicitly inserting these structures, in order to obtain a richer output from conventional mining algorithms. Another application we discuss is to use the information directly, in order to analyze certain aspects of the website under study. We also provide the simple linear-time algorithm which we developed to identify and extract the structures from our data.